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Abstract. With the aid of index functions, we re-derive the ML(n)BiCGStab algorithm in [39] in 
a more systematic way. It turns out that there are n ways to define the ML(n)BiCGStab residual 
vector. Each definition will lead to a different ML(n)BiCGStab algorithm. We demonstrate this 
by presenting a second algorithm which requires less storage. In theory, this second algorithm 
serves as a bridge that connects the Lanczos-based BiCGStab and the Arnoldi-based FOM while 
ML(n)BiCG a bridge connecting BiCG and FOM. We also analyze the breakdown situations 
from the probabilistic point of view and summarize some useful properties of ML(n)BiCGStab. 
Implementation issues are also addressed. 



1. Introduction 

If we express the BiCG [5|I15] residual as r^ lCG = pk(A)ro in terms of a polynomial, the residual 
vector rk of a Lanczos-type product method 2 based on BiCG is defined to be = 0&(A)pfc(A)rQ 
where 0&(A) is some polynomial of degree k with 0^(0) = 1. In CGS [29|, (j)k = Pk- Since, in 
every iteration, CGS searches for an approximate solution in a larger Krylov subspace, it often 
converges much faster than BiCG. However, CGS usually behaves irregularly due to a lack of a 
smoothing mechanism. In BiCGStab [32J, the (pk is 



(1.1) MX) 



1 if As = 

(p fc A + l)0fc_i(A) iffc>0. 



Here pk is a free parameter selected to minimize the 2-norm of r BiCGStab - m ^ e fc-iteration. As a 
result, BiCGStab is generally more stable and robust than CGS. BiCGStab has been extended to 
BiCGStab2 [?] and BiCGStab(Z) [24,28j through the use of higher degree minimizing polynomials. 
In BiCGStab2, the is defined by the recursion 



<PkW 



1 if k = 

(pkX + l)0fc_i(A) if A; is odd 

((a k X + /3 fe )(p fc _iA + 1) + 1 - /9 fe )0 fc _ 2 (A) if k is even. 



The parameters are again chosen to minimize BiCGStab2 residuals. Likewise, BiCGStab(Z) defines 
its 4>k as 

1 if k = 

(1 + X^=i a j^)<ftk-i(ty if k is a multiple of I 

where the parameters in the factor 1 + X]j=i yields an /-dimensional minimization in every 
Zth step. BiCGStab2 and BiCGStab(Z) usually converge faster than BiCGStab because of smaller 



fc (A) 



^This paper was presented in Gene Golub Memorial Conference, Feb. 29-Mar. 1, 2008, University of Mas- 
sachusetts, Dartmouth, U.S.A.. 

2 For this type of Krylov subspace methods, one can consult [5]. They are called hybrid BiCG methods in [28| . 
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residuals in magnitude while avoiding near-breakdowns caused by a possibly too small pt- CGS, 
BiCGStab and BiCGStab2 have been summarized and generalized by GPBi-CG [JT]. Here ^ is 



GPBi-CG will become CGS, BiCGStab or BiCGStab2 when the a,j3,p are appropriately chosen. 
For detailed descriptions of these and other product-type methods, one is referred to [6ll8ll2ip23|l33] 
and the references therein. Moreover, a history of product-type methods can be found in [TO]. The 
history starts three decades ago with IDR [37] method which can be considered as the predecessor 
of CGS and BiCGStab [25]. Recently, IDR has been generalized to IDR(s) with a shadow space 
of higher dimension, see [25, 31 , 35J . IDR(s) has close relations with ML(s)BiCGStab. 

Generalizations of BiCGStab to methods based on generalizations of BiCG have been made. 
For example, BL-BiCGStab [1] is a BiCGStab variant built on the BL-BiCG [UJ for the solution 
of systems with multiple right-hand sides. ML(n)BiCGStab [39] is another BiCGStab variant 
built on ML(n)BiCG, a BiCG-like method derived from a variant of the band Lanczos process 
described in [1] with n left-starting vectors and a single right-starting vector. 

The derivation of the ML(ra)BiCGStab algorithm in [39J was complicated. In this paper, we 
exploit the concept of index functions to re-derive the algorithm in a more systematic way, step 
by step. Index functions were introduced in [38] by Boley for the purpose of simplifying the 
development of the transpose-free multiple starting Lanczos process, and they proved to be very 
helpful. 

It turns out that the definition of the ML(n)BiCGStab residual vector in [39] is not unique. 
There are at least n different ways to define r&. Let r/% be the residual of ML(n)BiCG and 4>k(^) 
as in (jl.ip . Then, the ML(n)BiCGStab residual in [39J is defined as 



where k = jn + i, 1 < i < n, j = 0,1,2, ■■■ . Starting from k = 1, let us call every n consecutive 
iterations an iteration "cycle" . For example, iterations k = 1,2, ■ ■ ■ ,n form the first cycle, itera- 
tions k = n + 1, n+2, • • • ,2n the second cycle and so on. Then definition (|1.2[) increases the degree 
of 4> by 1 at the beginning of a cycle. One actually can define by increasing the degree of (j) 
by 1 anywhere within an iteration cycle. Each definition will lead to a different ML(n)BiCGStab 
algorithm. As an illustration, we derive a second ML(n)BiCGStab algorithm associated with the 
definition 



(jl.3p increases the degree of ^ by 1 at the end of a cycle. The resulting algorithm requires about 
25% less storage (not counting the storage of the coefficient matrix and the preconditioner) than 
the algorithm associated with definition (II. 2D . However, one drawback with this storage-saving 
algorithm is that, in some experiments, its computed residual can easily diverge from the 
corresponding exact residual when n is moderately large. 

Both ML(n)BiCG and ML(n)BiCGStab possess a set of left starting vectors (or, shadow vec- 
tors) qi,--- ,q n that can be chosen freely. This freedom appears to be an advantage of the 
methods. It not only helps stabilize the performance of the algorithms, but also allows to see a 
connection between the Lanczos-based BiCG/BiCGStab and the Arnoldi-based FOM. 

Just like BiCGStab, ML(n)BiCGStab can suffer from three types of breakdown, caused respec- 
tively by the failure of the underlying Lanczos process, the non-existence of the LU factorization 
during the construction of ML(n)BiCG and the parameters pt- We prove that the breakdown 
probability is zero when the shadow vectors are selected randomly. 




(1.2) 



r k = cf)j(A)r k 



(1.3) 




if 1 < i < n - 1 
if i = n. 
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The outline of the paper is as follows. In £}2j we introduce index functions. In £j3j we present the 
ML(n)BiCG algorithm introduced in [39], from which ML(n)BiCGStab algorithms are derived. 
In ?J31 we rederive the ML(n)BiCGStab algorithm in [39] by index functions. In £|5l we derive a 
storage-saving ML(n)BiCGStab algorithm from a different definition of the residual vector. In 
£j6j we discuss relationships of ML(n)BiCGStab with some other methods. In £J71 implementation 
issues are addressed. Concluding remarks are made in SJHI 

2. Index Functions 

Let be given a positive integer n. For all integers k, we define 

9n(k) = [(k - l)/nj and r n (k) = k- ng n {k) 

where [_ • J rounds its argument to the nearest integer towards minus infinity. We call g n and r n 
index functions; they are defined on Z, the set of all integers, with ranges Z and {1,2, • • • ,n}, 
respectively. 
If we write 

(2.1) k = jn + i 
with 1 < i < n and j £ Z, then 

(2.2) 9n(jn + i)=j and r n (jn + i) = i. 

Table I2TT1 illustrates the behavior of g n and r n with n = 3. It can be seen that g n (k) has a jump 
when k, moved from left to right, passes a multiple of n. 



k 





12 3 


4 5 6 


7 8 9 


10 11 12 ••• 


9n(k) 


-1 





1 1 1 


2 2 2 


3 3 3 


r n (k) 


3 


1 2 3 


12 3 


1 2 3 


12 3 



Table 2.1. Simple illustration of the index functions for n = 3. 



The following properties can be easily verified by using (|2.2p . 

Proposition 2.1. Let k G N , the set of all positive integers, and s G Mq '■= N L) {0}. 

(a) g n (k + n) = g n (k) + 1 and r n (k + n) = r n (k). 

(b) g n (s + 1) + 1 = g n (k + 1) i/max(/c - n,0) < s < g n (k)n - 1. 

(c) g n (s + 1) = g n (g n (k)n + 1) = g n (k) if g n (k)n < s < k - 1. 

(d) g n (k + 1) = g n {k) + 1 if r n (k) = n. 
g n (k + 1) = g n {k) if r n {k) < n. 

(e) max(fc — n, 0) > g n (k)n — 1 if r n (k) = n or g n (k) = 0. 



3. A ML(n)BiCG Algorithm 

Parallel to the derivation of BiCGStab from BiCG, ML(n)BiCGStab was derived in [39 from 
a BiCG-like method named ML(n)BiCG, which was built upon a band Lanczos process with n 
left starting vectors and a single right starting vector. In this section, we present the algorithm 
of ML(n)BiCG from [39J and summarize some properties of it. 
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3.1. The Algorithm. Consider the solution of the linear system 

(3.1) Ax = b 

where A G qNxN anc j ^ ^ qn ^ Throughout the paper we do not assume the matrix A is 
nonsingular except where specified. The solution of singular systems has been extensively studied 
in area of iterative methods, see, for instance, [2jll4 t [i"6 l ll9 t l36 t l40] . 

Let be given n vectors qi , . . . , q n G C N , which we call left starting vectors or shadow vectors. 
Set 

(3.2) p k = {A H ) 9 " {k \ rn{k) 

for k = 1,2,"' . The following algorithm for solving (|3.ip is from [39|. 

Algorithm 3.1. ML(n)BiCG 3 

1. Choose an initial guess xo and n vectors qi,q2, ■ ■ ■ , q n - 

2. Compute r q = b — Axo and set p\ = qi , go = ?o- 

3. For k = 1, 2, • • • , until convergence: 
4- ctk = pf r fc _i/p^Ag fc _i; 

5. x fe = x fc _i + Qfcgfe-i; 

6. r k = r fc _i - aj.Agj._i; 

7. For s = max(/c — n, 0), • • • , k — 1 

8- # = -P?+iA (?* + ESL(*-n,o) /pf+iAg-' 

9. .End 

^0. gfc = Tfe + Z] s =max(fc-n,0) ^ s 

11. Compute Pfc+i according to 113.2]) 

12. End 

The ML(n)BiCG algorithm is a variation of the classical BiCG algorithm. The left-hand side 
(shadow) Krylov subspace of BiCG is replaced by the block Krylov subspace with n starting 
vectors qi,q 2 , ■•• ,q n : 

B k := the space spanned by the first k columns of [Q, A^Q, (A^ ) 2 Q, • • • ] 
= span{pi,p 2 ,-- ,Pk} 

= Ei=i fe ^ 9 „(i)+i( AK >qt) + EILr^cfej+i^w^^'^) 

where Q := [qi, q 2 , • • • , q n ] and 

JC t (M, v) := span{v, Mv, ■ ■ ■ , M* _1 v} 

for M G C NxN ,\r G C N and t G M. Moreover, in ML(n)BiCG, the basis used for B k is not chosen 
to be bi-orthogonal, but simply the set {pi,P2> ■ ■ ■ ,Pk}- Therefore, the ML(n)BiCG algorithm 
can be viewed as a generalization of a one-sided Lanczos algorithm (see O[20j). The likely ill- 
conditioning of this basis does not matter, as the algorithm is only a technical tool for deriving 
ML(n)BiCGStab and this basis disappears in ML(n)BiCGStab because A^ will be absorbed by 
the residuals and direction vectors of ML(n)BiCGStab. For constructing the right-hand side basis 
consisting of residuals r k , we used recurrences that generalize the coupled two-term recurrences 
of BiCG, that is, direction vectors g^ are also constructed. 

3 A1 gorithm r3. 1 I consists of exact mathematical formulas for at, /3s fc ^ , xjt, and obtained in §3 of [39]. Repeated 
operations should be removed in order to make the algorithm computationally efficient. Moreover, even though 
the algorithm has not been tested, it is believed to be numerically instable because of Line 11 in which the left 
starting vectors are repeatedly multiplied by A H , a type of operation which is highly sensitive to round-off errors. 
The algorithm has been introduced only for the purpose of developing ML(n)BiCGStab algorithms. 
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3.2. Properties. Let v be the degree of the minimal polynomial p m i n (X; A, ro) of ro with respect 
to A (that is, the unique monic polynomial p{\) of minimum degree such that p(A)ro = 0) and 
let 

S„ = [pi,p 2 ,-- - ,p*,] H A[r ,Ar ,--- .A" - ^] 

and 

W„ = [pi,P2, " ,Pu} H [ro,A? , - ■ ■ .A" - ^]. 

Denote by S; and W/ the I x I leading principal submatrices of S u and Wj, respectively. We now 
summarize some useful facts about Algorithm 13.11 They can be derived from the construction 
procedure of the algorithm. 

Proposition 3.1. In infinite precision arithmetic, if Y\d=i det(Sj) det(Wj) ^ 0, then Algorithm 
\3.1\ does not break down by zero division for k = 1, 2, • • • , v, and x u is the exact solution of A3.1\) . 
Moreover, the computed quantities satisfy 

(a) x fc G x + /Cfc(A,r ) andr k = b - Ax k G r + AK. k (A,r ) for 1 < k < v. 

(b) span{r , r x , • • • , r fc _i} = JC k (A, r ) for 1 <k <v. 

(c) span{Ar ,Ar lr ■ ■ , Ar,_i} = JC V (A, r ). 

(d) r k J_ B k and r k JL p k+1 for < k < v - l. 4 

(e) span{g ,gi, ■ ■ ■ ,gfc-i} = /C fc (A,r ) for I < k < v. 

(f) span{Ag ,Agi,--- , Ag^i} = /C^(A, r ). 

(g) Ag fc _L B k and Ag k JL p k+ i for < k < v - 1. 

Because of Proposition I3.1f a) and (d), ML(n)BiCG is an oblique projection Krylov subspace 
method [21] . 
.Remarks: 

(i) The matrices Si and W; have already appeared in [12yi3| where they were called moment 
matrices. Proposition 13.11 can be regarded as a generalization of Theorem 2 in |13| from 
n = 1 to n > 1. 

(ii) Just like BiCG, ML(n)BiCG also has two types of breakdown caused, respectively, by the 
failure of the underlying Lanczos process and the nonexistence of the LU factorizations of 
the Hessenberg matrix of the recurrence coefficients. Both types of breakdown are reflected 
in Algorithm 13.11 by p^Ag^_i = 0. The condition Y\d=i det(Wj) ^ guarantees that the 
underlying Lanczos process works without breakdown, and the condition Y\i = i det(S/) ^ 
ensures that the LU factorizations exist. 

(iii) det(S„) ^ implies that p m i n (0; A, ro) ^ which, in turn, implies that f)3. 1 1) is consistent 
and has a solution lying in xo + K, U {A, ro). 

The derivation of ML(n)BiCGStab will require the following result which, in the case when 
n = 1, has been used in CGS and BiCGStab. 

Corollary 3.1. Let s € N and 

<W A ) = c gn(s) X^ + tV-M-^W- 1 + • • • + co 
be any polynomial of exact degree g n (s). Then, under the assumptions of Proposition HOI 

1 1 
Ps*k = - q^ (s) V> 9 „( s )(A)r fe and pf Ag k = - q^ (s) A^ n(s) (A)g fc 

C 9n(s) C 9n(s) 

if Q <k <v — 1 and s < k + n. 
4 We say that u _L v if u H v = 0. 
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Proof. It is easy to verify that 

Ps - z~— iJ gn{s) {A H )q rn(s) £ B k 

by Proposition 12.1( a) and (|3.2p . where the overbar denotes complex conjugation. The corollary 
then follows from Proposition 13. 1 I f d) and (g). ■ 

Corollary 13.11 essentially says that adding to p s a vector from B^ does not change the inner 
products pfrk and p^Agfc. 

Examples exist where the condition \\^ =1 det(Wj) det(Sj) ^ in Proposition 13.11 holds, as 
shown below. 



Lemma 3.1. Consider the case where n = 1, ro £ 1Z N , tq ^ 5 and A £ -ji^xN ^ s nons i n g U i ar , 
Ifq.i £ TZ N is a random vector with independent and identically distributed elements from iV(0, 1), 
the normal distribution with mean and variance 1, then Prob ^II/li det(W/) det(Sz) =0^ = 0. 

Proof. Since = A 9n ^(\ rn ^) = A fe_1 qi when n = 1, both S u and are Hankel matrices 



•51 



S2 
S3 



Si, 



Sv+1 ' ' ' S2v-\ 
T A t-l< 



W, 



qf A t_i r for t = 1,2, 



W2 Ws 
W v W v+ i 

2v - 1. 



Wu+1 
W2v-1 



Prob (det(Wi) = 01=0 



where = q^A*ro and wt 
We first prove 

(3.3) 

for any fixed I with 1 < I < v. It is trivial that (13. 3f) holds when / = 1 and we therefore assume 
I > 2 in the following discussion. 

By assumption, v is the degree of the minimal polynomial of ro with respect to A. This implies 
that K, := span{A*ro 1 1 £ TVo} is a ^-dimensional space with {ro, Aro, • • • , A^ -1 ro} as a basis. 
Since A is nonsingular, {A z_1 ro, A^ro, • • • , A l+u ~ 2 ro} is another basis of K. 

Perform an orthogonal factorization of the N x v matrix 



J r 



QR 



where Q £ TZ NxN is orthogonal and R £ 7Z Nxu is upper triangular with positive main diagonal 
elements r%\, t-zi, ■ • • , r uv . Clearly, the first v columns of Q form a basis of K, and the last N — u 
columns belong to K . 
Write 



A-2 



r = eiA'-^o + &A% + • • • + ^A'+^ 2 r 
= [A l - l v Q ,A%,..- ,A+^ 2 r ]£ 
= QR£ = Qr/ 



for some scalars £i,£2> ■ ■ ■ £ where £ = [£i,£2, 
R£ £ 1Z N . Since A is nonsingular and {ro,Aro,- 
£ v 7^ and hence n u = r uu £ u ^ 0. Let 9 = [Q\, 62, ■ ■ ■ 



■ , £ V ] T £ W and n = [771,772, • • • , r] N ] T = 
,A 1/ ~ 1 ro} linearly independent, we have 
9jv] t = Q r qi. Then 6 is a random vector 



In Lemma 13.11 and Theorem 13.11 ro can be any non-zero vector in 1Z N , not necessary to be a residual vector 
like r = b — Axq- 



w t 
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with iid elements from N(0, 1) [3]. We now express det(W^) in terms of the elements of 9. Let us 
write 

[WI,W 2 ,--- ,Wl-2,Wl-l,Wl,--- ,W2l-l] 

= qf[r , Ar , • • • , A^o, A'- 2 ? , A*" 1 ^, • • • , A 2 '- 2 ? ] 

= qf[r , Ar 0! • • • , A'- 3 ? , Qrj, QR«] 

= qfQ[Q r [r 0) Aro,--- ,A'- 3 fo],r?,RW] 

= T [Q T [r o ,A?o,--- .A'-^ol.^R^] 

where R^ denotes the matrix consisting of the first I columns of R. Since the last N— v columns 
of Q belong to K^, the last N — v rows of the matrix Q T [ro, Aro, ■ ■ ■ , A' _3 ro] are zeros. Similarly, 
the last N — v elements of r] = R£ are zeros because the last N — v rows of R are zeros. We 
therefore have 

a linear combination of 9\, 9%, ■ ■ ■ , 9 V if 1 < t < I — 2, 

+ V2O2 H 1- f]v9 v with rj u ^ if t = I — 1, 

r\,t-i+\9i + r 2 ,t-i+i92 H h r t _| + i )t _j + i^_j+i 

with r t _ J+ i )t _i+i ^ if I < t < 21 - 1. 

This shows that none of the random variables 6 u +i, 9 u + 2 , • • • > On is involved in any of the w's. In 
more detail, when I < v, w t = Wt(.Oi,0 2 , ■■■ , 9 U ) if 1 < t < I — 1 and w t = w~t(9i,0 2 , • • • , fl^-i) 
if Z < i < 21 — 1; when I = u, Wt = Wt(9i,9 2 , ■ ■ ■ ,9 U ) ii 1 < t < v — 1 or t = 2v — 1, and 
= w t (di,02,-- ■ A-ljJf i/ < t < 2i/ — 1. 

We now expand det(W/) by minors down its last column and write it into a polynomial in 0„. 
This yields 

(-l)^('+ 1 )+ 1 det(W;) 

= w 2 i-iw\z\ H 

(EUl rrf^)^" 1 ^- 1 + q-20!r 2 + • • • + ci0„ + co if 2 < Z < i/, 
r w ^ -1 ^ + ttz-i^ -1 + ' ' ' + di9 u + d iil = v 

where the coefficients cq, ■ ■ ■ , q_2 and do, • • • , g^-i are polynomials in 9i,0 2 , • • • , Now (|3.3p 

follows from the facts that ru ^ 0,r uu ^ 0,n u ^ and 9\,9 2 ,-' ,9 U are independent random 
variables. 

Note that v is also the degree of the minimal polynomial of Aro with respect to A when A is 
nonsingular. With tq replaced by Aro m t|3.3j) . we then have 

(3.4) Prob fdet(Sj) = o) = 



for any I with 1 < Z < v. 

Now, (]3.3[) and (|3.4p together imply that 



Profc- f n/=l det(Wj) det(Sj) = 
< ET=i (det(W,) = 0) + £r=i Pro6 (det(Si) 
and the lemma is proved. 

The A in Lemma 13. II is assumed to be nonsingular. For a general A, we have 



Theorem 3.1. Consider the case where n = 1, r € K N , f / and A G ^ 7Vx7V . 7/ qi G 
72.^ is a random vector with independent and identically distributed elements from N(0, 1), then 

Prob (nr=i det(W,) det(Sf) = o) = »/ and onZy i/p min (0; A,r ) / 0. 



ML(n)BICGSTAB: REFORMULATION, ANALYSIS AND IMPLEMENTATION 



8 



Proof. If 

Pmin{0'i A, i"o) — 0, then A^ro is a linear combination of Aro, • • • , A^ ^ro or A^ro — in 
the case when v = 1. Hence det(S^) = no matter what qi is and therefore Prob (jY^i det(Wj) det(Sj) 
1. 

We now suppose p m in(0] A, r"o) 7^ 0. By the real version of the Schur's unitary triangularization 
theorem (see, for instance, [H]), A can be decomposed as 

Bn B12 
B 22 

where Q E j^NxN ^ g orthogonal, Bn E j^rnxNi n0 nsingular and B22 E jiN 2 xN 2 strictly upper 
triangular (namely, an upper triangular matrix with its main diagonal elements zero). Let ro = 
Qr = [foi,r^ 2 ] T where r i E TZ Nl andr 02 E TZ N2 . Then p min ( B ; A , r ) r = Qp min (A; A, r )r = 
0. Note that 



Q 



Q 



Q T BQ 



(3.5) 

for k E Af, we have 



B 



Bk 
22 



p min (B; A,r ) 



Pmin (B11 ; A, r ) 



Thus, p m j n (B; A, ro)ro = implies that p n 



Pmin (B22; A,r ) 

(B 22 ; A,r )r 02 = 0. If we write p. 



mm 

Ylt=o C *A* w ith Co / 0, then Ylt=i C *B 22 is a strictly upper triangular matrix. Thus, p m i n (B 22 ; A, ro) 
(J2t=i C *B 22 ) + col is an upper triangular matrix whose main diagonal elements are cq. So, 
Pmin(B 22 ; A, ro) is nonsingular and therefore Pmin 

(B 22 ; A, r )r 02 = yields r 02 = 0. Since r 7^ 
due to ro 7^ by assumption, ?o 2 7^ ?o. In other words, N2 < N or Bn is not a null matrix. 
Now that ro 2 = 0, (|3.5p implies that 



(A; A,r ) 



(3.6) 



B fc ? 



Bfxroi 




for k E Af. Therefore, p(B)ro = [(p(Bn)roi) , ] for any polynomial p(A). Thus, the minimal 
polynomial of ro with respect to B is equal to the minimal polynomial of ¥qi with respect to Bn. 
This implies that, v, the degree of the minimal polynomial of ro with respect to A, is also the 
degree of the minimal polynomial of roi with respect to Bn. 

We now set 8 = Q qi = [dj, &2} T where 9\ E TZ Nl and 82 E 1Z N ' 2 . Since qi is random with iid 
elements from iV(0, 1), so is 8. By (|3.6p , 



gTxjfcc 



B fc r = 6»fB n r i and qfr 



where k E Af. Thus 

S v (A,r ,qi) = S v (Bii,r i > (9i) and W„(A, r , qi) = W„(Bn, r 01 , 8 X ). 



Now, the desired probability follows from Lemma 13 . 1 1 because Bn is nonsingular, 8\ is iid iV(0, 1) 
random and v is the degree of the minimal polynomial of Fqi with respect to Bn. ■ 



Extension of the theorem to the general case should be possible, namely, n > 1, A E C ,tq E 
C Nxl and [qi, • • • , q n ] is a Gaussian matrix. We remark that, when A is non-defective, the general 
case has been proved in the proof of Theorem 3 of [30] . The proof was based on the observation 
that, if a polynomial p(Ai, • • • , A;) ^ 0, then Prob (p(Ai, • • • , A;) = 0) = when Ai, • • • , A; are 
randomly chosen. 

Remark: p m j n (0; A,ro) 7^ if and only if the affine space xo + span{A*ro|i E Afo} contains a 
solution to (|3.ip . 
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The following corollary then follows from Proposition 13. II and Theorem 13.11 



Corollary 3.2. In the case where n = 1, \'3. 1)) is a real system and qi G 1Z N is a random vector 
with iid elements from A(0, 1), Algorithm VJ.1\ almost surely works without breakdown by zero 
division to find a solution from the affine space xo + span{ A*ro|t £ Wo} provided that xq £ 1Z 
is chosen such that the affine space contains a solution to \3.1\) . 



Remarks: 

(i) The initial guess xo in Corollary 13.21 is a user-provided vector. It may not be a random 
vector in some applications. For example, in cases where a sequence of similar linear 
systems is solved, the solution from the previous system may be used as the xo for the 
new system. 

(ii) If we pick xo £ TZ randomly and set qi = b — Axo, then Algorithm 13.11 with n = 1, 
or equivalently in mathematics, the standard BiCG (see $6]), almost surely solves f)3. 1 1) 
without breakdown by zero division for all, but a certain small class of, nonsingular A € 
K NxN . For details, see p3]. 

4. A ML(n)BiCGSTAB Algorithm 

An algorithm for the ML(n)BiCGStab method has been derived from ML(n)BiCG in |39| 
(Algorithm 2 without preconditioning and Algorithm 3 with preconditioning in |39|). but the 
derivation there is complicated and less inspiring. In this section, we re-derive the algorithm in a 
more systematic fashion with the help of index functions. 

4.1. Notation and Definitions. Let <j) k {\) be the polynomial of degree k defined by (jl.lj) . If 

expressed in terms of the power basis 

(4.1) MX) = c px k + ... + c[ k) \ + c { *\ 
it is clear that c[ = pip2 ■ ■ ■ p k and Cq = 1. Thus, 

(4.2) i fe )=p fc e/). 

In ML(n)BiCGStab, we construct the following vectors: for k € JV, 

rfc = 9n ( fe )+i(A)r fc , u k = gn ( k )(A)r k , 

(4.3) g fc = g „(jfc) + i(A)g fe , d k = p gn{k)+1 A(j)g nik) (A)g k , 
w fc = Ag k 

and for k = 0, set 

(4.4) r = r and go = go- 

The vectors r k will be the residual vectors of the approximate solutions x^ computed in the 
following ML(n)BiCGStab algorithm. 

4.2. Algorithm Derivation. The derivation parallels the one of BiCGStab from BiCG. We first 
replace all the inner products p^r and p^Ag in ML(n)BiCG respectively by inner products of 
the forms q H 4>(A)r and q^Ac/>(A)g, where 4> is the polynomial (jl.ip . Corollary 13.11 guarantees 
that the inner products remain unchanged with such replacements. Then we compile recurrences 
for the new residuals r/% and the corresponding iterates. The overall derivation is best described 
and verified in stages, and depends on Proposition 12.11 and Corollary 13.11 

The derivation is complicated by the fact that the recurrences in the kth iteration in ML(n)BiCG 
involve n terms which stretch from k — n to k — 1. Note that k — n< g n {k)n < k — 1. The degrees 
of the 4>g n ( s ) and 4> gn ( s )+i m (|4.3|1 are increased at g n (k)n + 1 as s runs from k — n to k — 1 (see, 
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for example, Table [27L]) . Therefore, our first task in the derivation is to split up in ML(n)BiCG 
the loops and the sums of length n into two parts, one from k — n to g n (k)n — 1 and the other 
from g n {k)n + 1 to k — 1. The following Derivation Stage (DS) #1 is computationally equivalent 
to Algorithm 13.11 (forgetting Lines 1, 2, 5 and 11). 

Derivation Stage #1. 

1. For k = 1, 2, • • • , until convergence: 



2. Ifr n (Jfe) = l 

3. a k = pfr fe _ 1 /p| r Ag fc _i; 

4. r k = r fc _i - a fc Ag fc _i; 

5. Else 

6- a k = pf r fe _ x /pf Agfc_i; 

7. r fc = r fc _i - a fc Ag fc _i; 

8. End 

9. If r n (k) < n 

10. For s = max(/c — n,0), • • • ,g n (k)n — 1 

11" ^ = -Pf+IA (?* + £ t =Lc( fe -n,0) fl!*'*) /P? + lAg S ; 

12. End 

10 _ a j_\^9n(k)n-l a( k )ff\ l n H \~ 
P 9n(k)n ~ Pg n (k)n+1 A t Z^ =ma x(fc-n,0) ^ ^ / Pg„(k)n+1 A &3n(A>' 

14. For s = g n {k)n + 1, • • • , A; - 1 

15- = -Pf +1 A (r, + EtmS( fc -n,o) + fc W *) / P f +1 Ag s ; 

16. End 

17 a — r J- \^9n(k)n fl( fe )s i \^k~l «( fe )ft . 

11 ■ 8fc - Ifc + L s=max ( fc _^ ) Ps Bs ^ 2^ s =g n (k)n+l P s Ssj 

18. Else 

19 - ^g n (k)n = _P 9l(A:)n+l Arfc /Pgl(A:)n+l A S9„(fc)n; 

20. For s = g n {k)n + 1, • • • , k - 1 

21- = -pf+iA (?* + ^(^(fcjn + ES n(fe ) n+ i /pf+i Ag s ; 

22. End 

gfc - r fc + P 9n (fc) n S9n(fc)n + 2^ s =g n (k)n+l Ps Ss', 

24. End 



25. End 

We have adopted the conventions: empty loops are skipped and empty sums are zero. These 
conventions will also be applied in the sequel. 

In the next stage of the derivation, we replace inner products p^r and p^Ag by inner prod- 
ucts of the forms q^0(A)r and q^A^(A)g respectively. That is, the factor (A^) 9 ™^ that is 
hidden in the left basis vector p^, is moved to the right-hand side space and replaced by the factor 
4> gn{k) {A). Formally, by Corollary O together with (jO) and Proposition E^a), DS#1 

can be further transformed into the version below. Explanations are given after listing. 

Derivation Stage #2. 

1. For k = 1, 2, • • • , until convergence: 

2. Ur n (k) = l 

3 - «fc = q^(fe)^„(fc)(A)? fc -i/q^( fc )A0 9n{A;) (A)g fc _ 1 ; 

4 - 4> 9n {k){A)r k = gn(fc) (A)r fc _i - a k A<f> gn ( k )(A)g k -i] 
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5- (j) gn ( k)+1 (A)? k = (p gn{k)+1 A + I)0 ffn(fc) (A)r fc ; 

6. Else 

7 - «fc = q^(fc)^„(fc)(A)?fc-i/q^( fc) A^ n(fc) (A)g A ,.„ 1 ; 

8 - Sn (fc)( A ) ? it = ( / , a „(fe)( A )rfc-i - afeA0 gn(jfc )(A)g fc _i; 

9- 9n(fc)+1 (A)f fc = 9n(fc)+1 (A)r fc _i - Q A .A(/> 9n(fc)+1 (A)g fc _i; 

10. End 

11. If r n (k) < n 

12. For s = max(k — n, 0), • • • , g n (k)n — 1 

is- # = -qf n(s+ i) (<^( S +i)+i( A )^+ 

E?=max(fc-n,0) ^ Vg„(H-l)+l Mg n (s+1) ( A )gt) / p 9n(s+1)+1 q^ (s+1) A0 Sn(s+1) (A)g s ; 

14. End 

Ef=max(fc-n,0) / 3 t' C V(/ n (fc)+l A 0g n (fc)( A )it) / P 9 „(fc) +1 qf A0 ffn ( fc ) (A) 

16. For s = g n (k)n + 1, • • • , k — 1 

17 - tf = -<( S+ 1) (0 9n (-+l)+l( A )^ + Etmix(fe-n,0) V 9n («+l)+l A ^n(,+l)( A )gt + 

E?i(fc)„ + 1 V S „(« + l) + l A ^n( S + l)( A )gt) + l)+l < l^(s+l) A ^Sn(s+l)( A )gs; 

18. End 

19 - Pg n (k)+l A <t>g n (k)( A )Sk = pg n ( k ) +1 A(f> gnik) (A)? k + 

E^ ma x(fc-n,0) & Pg n (k)+l A( t>g n {k){ A )%s + 
Es=g n (fe)n+1 ffl ' Pgn(k)+l A 4>g n (k)(A-)g s ; 

20. ^, l( fc)+l( A )gfc = K(k)+l( A )rk + E«=max(fc-n,0) ^V Sn (fe)+l ( A )gs + 

Es=g n (fc)n+1 / 3 s fe V 9n (fc)+l(A)g s ; 

21. Else 

22 - ^(^n = -qf ^„(fc)+i( A )rfc/p 9n (fc)+iqf A0 9ii(fc) (A)g 9n(fc)n ; 

23. For s = g n (k)n + 1, • • • , k — 1 

24 - ffl = -q^( s+ i) (^ 9 n(s+i)+i( A )rfe + ^ ) {A . )n P ffn ( s+ i) + iA^ n(s+1) (A)g ffn(fe)n + 

E?i( fc )n + i^f ) Ps„( s +i)+l A a „( S +l)( A )gt) /p 9 „(s+i)+iq^ (s+1) A</.« ?n(s+ i ) (A)g s ; 

25. End 

26. gn{fc)+ i(A)g A; = an(fc)+1 (A)r fe + pf\ k)n <t> 9n {k)+i( A )%g n {k)n+ 

Es=g n (fc)n+i / 3 s fe V 9n (fc)+i(A)g s ; 

27. End 

28. End 



Lines 4, 8, 9, 19, 20 and 26, DS#2, were obtained from Lines 4, 7, 17 and 23, DS#1, through 
a multiplication by <p gn(k) {A), <^ n(fc)+1 (A) and p Sn(fe)+1 Ac/> gn(fc) (A) respectively. Line 5, DS#2, is 
a direct result of the definition (jl.ljl of 0. These lines are prepared for the updates of the vectors 
defined in (j4T3|) . 

To help understand how DS^l is turned into DS#2, let us demonstrate (i) the transformation 
of Line 3, DS^l, into Line 3, DS#2 and (ii) the transformation of the term ^ n+1 Ar k on Line 

13, DS#1, into the term ' 4> gn ^ + i(A)r k on Line 15, DS#2, as follows. 
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(i) By Corollary EU 

pfAg fc _! ?5 ^q£ (fc) A^ fl(fc) (A)g fc _i q^ (fc) A0 9n(fc) (A)g fc _ 1 

9n(fe) 

where c^f^ is the leading coefficient of <ftg n (k)(^) (see Q4.ip ). 

(ii) By (|3.2p and Proposition 12.1( a). we have 

Vg n (k)n+1 ^ J Mr n (g n (fc)n+1) 

^ ) Hr n ((g n {k)+l)n+l) 
= P(g n (k)+l)n+l- 

Hence p^ (fe)n+1 Ar fc = p^ (fc)+1)n+1 r fc . Since (g n (k) + l)n + 1 < fc + n, an application of 
Corollary O to p^ n ( fc ) +1 ) n+1 rfc thus yields 

P gl(fc)n+l A ^ A: = ,.(9n((su(fc)+l)»+i))^(( 9n (fc)+l)n+l)^n((<?n(fe)+l)n+l)(A)rfc 

9n((9n(fc) + l)«+l) 
= r <,9nW + l) qf </ ) 9 n(fc) + l(A)r fc . 
9n(k) + l 

The second equation above follows from (12. 2D . The coefficient V c j (i^ 1 ' is missed from 
Line 15, DS#2, because it was canceled out by the coefficient from the denominator. 



Our goal is to establish updating relations for the quantities introduced in (14, 3p . To this end, 
we further transform DS#2 into the following version. This time, we work on the index function 
g n with the aid of Proposition 12.11 so that the definitions in (14, 3p can be applied. Again, further 
explanations are given after the listing. 

Derivation Stage #3. 

1. For k = 1, 2, • • • , until convergence: 

2. Ifr n (fc) = l 

3 - a k = q^( fc )^ n (fe-i)+i( A ) ? ^i/q^(fc) A ^n(fc-i)+i(A)gfc-i; 

4 - ^s™(fc)(A)?jfc = 5 „( fe _i) + i(A)r fc _i - atfeA(^ ffB ( fc _ 1 ) +1 (A)g fc _i; 

5- (f) gn{k)+1 {A)T k = (p gn{k)+1 A + I)(j)g n{k) (A)? k ; 
6. Else 

7 - a* = qS(fe)^s„(ifc-i)(A)r fc _i/q^ {jfc) A0 Sn(fc _ 1) (A)g fc _ 1 ; 

8 - 0fln(k)( A )** = 5n(fe _ 1 )(A)r fc _i - afcA^ 9n ( fc _ 1 )(A)g fc _i; 

9- ff „(fe)+i(A)r fc = 9n(fc _ 1)+1 (A)r fc _i - a fc A < /> ffn ( fc _ 1)+1 (A)g fc _i; 

10. End 

11. If r n (k) < n 

12. For s = m&x(k — n, 0), • • • , g n (k)n — 1 

is. 4 fe) = -q^( S+ i)(^„w(A)^+ 

E t S =i ax (fc-n,0) ft^ ) / 9 g n (t)+lA^ n(t )(A)g t ) / p ffn ( s )+iq^ (s+1) A<^ n(s ) (A)g s ; 

14. End 

is- ^! ) {fe )„ = -qf(^«+i( A )^+ 

Ef=mS(fe-n,o) /^f ) / 9 9n (A ; )+iA0 9n ( t)+ i(A)g^ / p gn (fc)+iqf A0 9n(ffn ( fe ) n)+ i(A)g ffn(fe ) n ; 
16. For s = g n (k)n + 1, • • • , k — 1 

17 - ^ = -<( s+ i) f^„(fc)+i(A)r fc + Et:Sx(fc-n,o) /^f ) P Sn (fc)+i A ^n(t)+i(A)gt+ 
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8-1 d k ). 



£t=<L(fc)n+l Pt ' Pg n (t)+lMg n (t)(A)g t ) / Pg n (s)+l<^ n{8+1) Mg n (s) ( A )U S ; 

18. End 

19 - Pg n (k)+lMg n (k)(A)g k = p gn ( k ) +1 A(f> gnik) (A)? k + 

v-^g n (fc)n n(k) . / 

Z^= ma x(fc-n,0) P g Pg n (k)+l A( Pg n (s)+l{^)Ss + 



20. 



'9: 



's=max(i- n,0) 

Es=J„(fe)„ + i/ 3 8 /c) P g „( s )+iA0 Sn(s )(A)g s ; 

,.(fc)+l(A)gfc = 9n(fc)+ i(A)? fc + Ef=m a x(fc-n,0) ^« A + I )^W+l( A )&+ 

Es=g n (fc)n+1 / 3 8 &) 0g„( s )+l(A)g s ; 

21. Else 

22 - ^!(fe)n = ~qf ^3n(fc)+i( A )rfc/Pg n (fc)+iqf A0 gn{Sn(fc)n)+1 (A)g 9n(fc)n ; 

23. For s = g n (k)n + 1, • • • , k — 1 

24 - = -^(s+i) (^n(fc)+i( A ) ? fc + /3^ ) (fc)n /5 gn (fc)+iA0 9n(9n(fc)n)+1 (A)g 9n(fc)n + 

E^i^n+i 0? V 9n (i)+iA0 9n(4) (A)g t ) / p ffn(s ) + iq^ (s+1) A0 ffn(s) (A)g s ; 

25. End 

26 - aiI (fc)+l( A )^' = ^(fe)+l( A ) ? * + / 3 Sfc)n(^(fc)+l A + I )^n(9„(fc)n)+l(A)g 9n(fc)n + 

S s=9n (fc) n +1 A (s)+l( A )g s ; 

27. End 

28. End 

As an example, let us show how the g n {s + 1) inside the sum Ylt=max(k-n ) ' ' ' on Line 13, 
DS#2, was written as the g n (t) on Line 13, DS#3. 

If g n (k) = 0, Line 13 of DS#2 is not implemented because of the conventions immediately 
following DS#1. So, we assume that g n (k) > 0. Since 

max(/c — n, 0) < s, t < g n (k)n — 1, 

we have 

g n {s + 1) = g n {k + 1) - 1 = g n {t + 1) 

by Proposition 12.1( b). Now that g n (k) > 0, max(A; — n, 0) = k — n and hence 

(4.5) k - n < t < g n (k)n - 1. 

Let k = jn + i as in (|2.ip . Then (|4.5p is 

(j — l)n + ? < i < (j ; — l)n + n — 1 

which implies that r n (t) < n. Now, Proposition 12.1( d) yields g n (t + 1) = g n (t) and therefore we 
have g n (s + 1) = g n (i). 

Now we are ready to use the vectors defined in (j4.3|) and (|4.4p . Substituting these vectors into 
DS#3 leads to the following stage. 



Derivation Stage #4. 

1. For k = 1, 2, • • • , until convergence: 

2. Ifr„(fc) = l 

3 - «fc = q^(fc)rfc-i/q^(fc)Ag fc -i; 

4. u k = r fc _! - a fc Ag fc _i; 

5- r fc = /5 Sn(fc ) +1 Au fc + u fe ; 

6. Else 

7. « fc = P ffn (fc-l)+iq^(fc) u fc-l/q^(A:) d A;-i; 
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8. Ufc = — (ak/pg n (k-i)+i)dk-r, 

9. r k = r fe _i - a fc Ag fc _i; 

10. End 

11. If r„(A;) < n 

12. For s = max(/c — n, 0), • • • , g n (k)n — 1 

13 - = -1r n (s+l) ( U *: + Et=max(fc-n,0) Z^" 1 *) / ^(s+l)^ 

14. End 

_ („. _i_ „ , , r^Sn(fc)n-l „(fc) 



24 



26. 



16. For s = g n (k)n + 1, • • • , k — 1 

i7 - #» - -Qr„( S +l) ^ + Pgn(k)+l 2^t= m ax(fc-n,0) ^ A S* + 

Et=sn(fe)n+i Pt )d *J /q^( s +i) d s; 



18. End 

19. d fe = r fc - u k + p 9n (fc)+i Es=ma™( fc _ n ,o) ^ &) Ag s + Es=<L(fc)n+i ^- fc) d«; 

20. g fc = r k + Zt^U-n,o) W)+i A + I)g- + E^„ (fc) „ +1 

21. Else 

22 - ^!(fe)n = -qf^M/nW+iqf A s 9 n(fc)n; 

23. For s = g n (k)n + 1, • • • , A; — 1 

= _C ^(s+l) + ^n(fe)+l^Sit)n A Sp„(&)n + E t S =g n (fc)n+l ft^ )d *) / q^( s+ l) d s 



25. End 



g fc - r k + Pg n{k)n {Pg n {k)+l-k + I)gft,(*)n + £ s=ffn (fc)„ + i ft gs 



27. End 

28. End 

We consider to be the residual of the fcth approximate solution x k . Updating relations for 
Xfc can be obtained from Lines 4, 5 and 9 respectively: 



(4.6) x fc 



x fc _i - Pg n ( k )+iu k + ajfcgfe-i, if r n (k) = 1 
Xfc-i + ajfcgfc-i, if r n (/c) > 1. 



After adding (j4.6|) to DS#4 and simplifying the operations appropriately, we arrive at the fol- 
lowing ML(n)BiCGStab algorithm. Just like BiCGStab, the free parameter p gn ( k )-\-i on Line 5, 
DS#4, is chosen to minimize the 2-norm of r^. 



Algorithm 4.1. ML(n)BiCGStab without preconditioning associated with definition 

1. Choose an initial guess xo and n vectors qi,q2, ■ ■ ■ , qn- 

2. Compute tq = h — Axo and set go = ro- Compute wo = Ago, cq = qfwo- 

3. For k = 1, 2, ■ ■ ■ , until convergence: 
I Ifr n (k) = l 

5. a k = q^ (fe) r fe _i/c fc _i; 

6. u fc = r fc _i - afcWjt_i; 

7. x fc = x fc _i + a fc g fc _i; 

Pg n (k)+i = -(Au fc ) H u fc /||Au fc |||; 

9. x fc = x fc - /5 9n (fc)+iU fc ; 

10. r fc = p 9n(A . )+ iAu fc + u fc ; 
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Else 

5 ^ = qf„(t) u wM-b' %a k = otk/p gn {k-i)+i 

Ifr n (k) < n 

Ufc = Ufc_i — ajfcdfc_i; 
End 

Xfe = X fc _x + P 9n (k-l)+l a kSk-i; 

?k = r fc _i - p Sn (fe_i)+i5fcWfc_i; 
Fnc/ 

Ifr n (k) < n 

z d = u k , g k = 0, z w = 0; 

For s = fe — n, • • ■ ,g n (k)n — 1 and <7 n (fc) > 1 

^ ) = _q ^(s+l) Z(i / Cs; 

z d = z d + (3i k) d s ; 
gfc = gfc + w y g s ; 

— _L «( fc ) 

z ui — z ui + Ps W S; 

Fnd 

z w = r fc + Pg n (k)+l z wj 

Pg n (k)n = z w / c g n {k)n'> % $ gn ( k ) n = Pgn(k)+lPg n ( k ) n 

— _L 

Z w - Z w + Pg n (k)n W 9n(k)n, 

g k =g k + Z w + g k J {k) J 'Pg n ( k ) + l)Sg n ( k ) n ; 

For s = g n (k)n + 1, • • ■ , k — 1 

/?i ^ = -q^( s+ i) z M>/c s ; 
gfc = gfc + Ps g s ; 

z w = z w + d s ; 
Fnd 

d k = z w - u k ; 

c k = q rn ( fc+ i)dfe; 
w fc = Ag fc ; 
F/se 

@g n (k)n = ~ < il r k/ c g n (k)n,' % (3 gn ( k ) n = Pg n (k)+lfi ' 9n (k)n 

z w - r k + /5 Sn(A . )ra w 9n(fc) „; 

gfc = z w + (/5^(fc )n /p 9n (fc) + i)g Sn (fc)„; 
For s = g n (k)n + 1, • • ■ , k — 1 
/?i ^ = -q^( s+1 ) z «,/c s ; 

gfc = gfc +(3.s 'g s ; 

z w = z ui "i" /^s d s ; 
End 

Wfc = Ag fc ; 

c ^ = q^(fc+i) w fe/ 



End 
.End 



Remarks: 
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Table 4.1. Average cost per (fc-)iteration of Algorithm 19.11 and its storage requirement. 



Preconditioning (M x v) 


1+i 


Vector addition (u ± v) 


2-± 


Matvec (Av) 




Saxpy (u + av) 


U 2 

max(2.5n + 2.5 ,6) 

n 


dot product (u H v) 


7^ 

n+ 1 + - 

n 


Storage 


A + M + (4n + 5)N + 0{n) 



(i) Algorithm 14. 1 1 does not compute the quantities and when r n (k) = n (see Lines 13-15 
and Lines 39-50). 

(ii) if the u^, on Line 6 happens to be zero, then the p gn ( k )+l on Line 8 and therefore the x k 
and r k on Lines 9 and 10 will not be computable. In this case, however, the x k on Line 7 
will be the exact solution to system (|3.ip and Algorithm 14.11 stops there. 

We now compare Algorithm 14.11 with the ML(n)BiCGStab algorithm in [39J. First, the defi- 
nitions of rfc, u k and g k are the same in both algorithms, but is defined differently. In [39 1, 
dfe = 9n (jfc)(A)gfc. In exact arithmetic, however, both algorithms compute the same p 9n (k)+ii r k 
and Xfc. Second, the derivation of Algorithm 14.11 has been made simpler by using index functions. 
As a result, some redundant operations in Algorithm 2 of [39] can been seen and removed and 
some arithmetics are simplified. For example, the vectors d k ,u k are computed in every iteration 
in Algorithm 2 of |39j . They are now computed only when r n (k) < n. Also, the expression of 

8 f ,s on Line 39 of Algorithm 14. II is simpler. Some other minor changes were also made so that 
the algorithm becomes more efficient. 

Computational cost and storage requirement of Algorithm 14.11 obtained based on its precondi- 
tioned version, Algorithm 19.11 in ^9l are summarized in Table I4TT1 Since the vectors {qi, . . . ,q n } ; 
{dfc-n, • • • ,d Sn (fc) n -i,d fln (fc) n+ i, . . . ,dfc_i}, {gfc-n, • • • ,gfc-i} and {w fc _ n , . . . , w Sn ( fc )„, w fc _i} are 
required in iteration k, they must be stored. When n is large, this storage is dominant. So, the 
storage requirement of the algorithm is about AnN. 

4.3. Properties. We summarize the properties of Algorithm 14.11 in the following proposition. 
Since ro = ro by (14. 4jl . v (see §3. 2D is also the degree of the minimal polynomial of ro with respect 
to A. 

Proposition 4.1. Under the assumptions of Proposition [3A\ if p gn f k )+i and ~ VPg«(fc)+i ^ 
cr(A) for 1 < k < v — 1, where <r(A) is the spectrum of A, then Algorithm does not break 
down by zero division for k = 1,2, •• • ,u, and x u is the exact solution of \3.1\) . Moreover, the 
computed quantities satisfy 

(a) x fc E x +/C 9n(A .) +fc+1 (A,ro) andr k = b-Ax fc G r +A/C 9n ( fc)+fe+1 (A, r ) fori <k < v-l. 

(b) rfc ^ for 1 < k < v — 1 and r u = 0. 

(c) r k JL qi for 1 < k < v — 1 with r n (k) = n. 

(d) u k _L span{q 1 ,q 2 , ■ ■ ■ ,q rn (fc)} and u k JL q rn ( k )+l for \ <k <v - \ with r n (k) < n. 

(e) d k _L span{q 1 ,q 2 , ■ ■ ■ ,q rn (fc)} and d k / q rn (k)+l for \ <k <v - \ with r n (k) < n. 

Proof. The divisors in Algorithm 14. II are c k , || Au^ ||| and p 9n ( k )+i respectively, where the p's have 
been assumed to be nonzero. By Proposition 13.1( c). we have Ar^ ^ for 1 < k < v — 1. Since 
— 1/p f(A) by assumption, (f> g f k \(A) is nonsingular. Hence Au^ = <p gn ^(A) Ar k ^ (see 
(14. 3|) for the first equation). Therefore, ||Aufc|| 2 ^ for 1 < k < v — 1. 
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c k is defined respectively on Lines 37 and 49 in the algorithm. When r n (k) < n, we have c k = 
<tf n (k+i) d k- In this case, c k = p gn{k )+i<ir n ( k +i) A K(k)(^)Sk = P« ?n (fc)+iq^ (fe+ i ) A^ n(fc+1) (A)g fc = 

^(fe)+i c t(K) )) Pf+l A ^ = Pa-W+iciw^i*** = c tt+" )p k + ASk by », Proposition 
12. llf d) . Corollary 13, 1( (|4.ip and (|4.2p . Since the p's are nonzero and pj^Agfc 7^ by Proposi- 
tion 13.1( g) , we have c^jKj 1 * / and hence c fc / 0. When r n (k) = n, on the other hand, c k = 
<(fc+i) w fc = < (fe+ i)Agfc. In this case, c k = q£ (fc+1) A0 9n(fc)+1 (A) g fc = q£ (fc+1) A<^ n(fe+1) (A) g fe 

= c gn n (fc+i) ))p fc+i A ^ fc = c fl« n (l)+i p fc+i A S* ^ °- Therefore, in either case, we always have c k / 
for 1 < & < v — 1, Moreover, Co = qf^wo = q^ Ago according to Line 2 of the algorithm. Since 
pi = qi by d3T2l) and g = go by flO]), c / by Proposition Ejjg) . 

Now that ||Aufc||2 7^ and p gn ( k )+i 7^ for 1 < k < v — 1 and 7^ for < k < v — 1, 
Algorithm 14.11 does not break down by zero division in the first v — 1 iterations. When k = 
u fc = u,, = (f) gn („)(A)r u = and r fc = r v = (f> gn ^ +x (A)r v = due to r u = by Proposition E7Q 
If it happens that r n (v) = 1, then the x k (= x u ) on Line 7 is the exact solution to system (|3.1|) 
because its residual is zero. So, the algorithm stops there. Otherwise, the x k (= x^) on Line 
16 will be exact with residual r u = and where the algorithm stops. 

Part (a) follows from the definition of in (|4.3p and Proposition 13. If a). 

Since 7^ for 1 < k < u — lby Proposition I3.1f b) and 4> gn ^ + i(A) is nonsingular due to 
— 1/p c(A), we have = (p gn ^ +1 (A)r k 7^ 0. Therefore, Part (b) holds. 

For Part (c), write k = jn + n with < j. By (|4.3|) . (|4.ip and Corollary 13.11 we have qf^r^ = 
qf K(k)+i( A ) ? k = q^ ((i+1)n+1) 9 „((j+i)n+i)(A)? fc = q^ (fc+ i)0 9n (fc+i)(A)r fc = c^f^^pf+i^ = 

4tcS+i 1} Pfc+i**- Now Part ( c ) follows from Proposition Elld) and c^jj^ # 0. 

For the proof of Part (d), we first note that Algorithm 14.11 does not compute when r n (k) = n 
(see Lines 13 - 15). Write k = jn+i as in (|2.1|) and let 1 < t < i < n. Then r n {k) = i, g n {k) = j = 
9n{jn + t) and r n (jn + t) = t. Now, by (|4.3p and Corollary 13. 11 we have qf u^. = q^ 1 ' <j) gn ^{A)v k = 

q^y„+t)^yn+t)(A)r fc = c g 9 ™^^ ] pf n+t r k . Since pf n+t r k = by Proposition ED^d) , qf u fe = 

for 1 < t < i. Similarly, qf lUfc = c^ ( g }) pf n+i+1 r k = c^ { g )} p* +1 r k (the validity of the first 

equation requires i < n). Because of Proposition 13. ll fd) and c g 9n ^ k \ 7^ 0, qf? +1 xi k 7^ 0. 

Similar to the quantity u^, Algorithm 14.11 does not compute d k when r n (k) = n (see Lines 40 
- 49). By (|4.3|) . = p g ( fc ) +1 A<^> ff ^(A)% k and the proof of Part (e) is parallel to that of Part 
(d). ■ 

The conditions of p gn ( k ) + \ 7^ and — l/p 9n (k)+i cr (A) can be easily made satisfied. For 
example, one can add some small random noise (e.g., N(0,d) with 5 <C 1) to p 9n ( k )+i after it is 
computed. 



Corollary 4.1. Consider the case where n = 1, $3. 1}) is a real system and qi G TZ is a random 
vector with iid elements from N(0, 1). // some small random number is added to p 97l (fc)+i after it 
is computed so that p 9n (fc)+i 7^ and — l//0 9n (fc)+i ^ ^(A), then Algorithm ^. 1\ will work almost 
surely without breakdown by zero division to find a solution of \3.1\) from the affine space xq + 
span{ A'ro|t € A/"o} provided that xo € 1Z N is chosen such that the affine space contains a solution 
to < TO) . 

Proposition 14.11 indicates that exact solution can only be found at iteration k = v. It is 
possible, however, that ||rfc||2 can become very small for some k < v. In practice, we terminate 
the algorithm when ||rfc||2 falls within a given tolerance. 
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As in the case of BiCGStab, ML(n)BiCGStab can encounter a breakdown in its implementation. 
ML(n)BiCGStab, besides the two types of breakdown of ML(n)BiCG, has one more type of 
breakdown caused by p gn (k)+i- I n more detail, the divisors in Algorithm 14.11 are c&, ||Aufc|| 2 and 
p gn fk)+i- ^ || Au/. H2 = 0, then p gn Ck)+i = 00 an d a breakdown due to the overflow of p 9n (k)+i 
occurs. Under the assumptions of Proposition 14. 1^ on the other hand, it can be shown (see the 
proof of the proposition) that c k = c ^n!\+i Pfe+i-A-gfc, where c g 9n ^?^^ is the leading coefficient 
of Sn (fc)+i(A) (see (|4.ip ). So, c k is a quantity that relates to p 9n (k)+i an d the ML(n)BiCG divisor 
pf +1 Ag fc . Thus, either p gn ( k )+i = or p^ +1 Ag fe = can cause c k = 0. 

5. A Second ML(n)BiCGSTAB Algorithm 

If we write k = jn + i as in (12. ip . the defined by (|4.4|) then becomes 

(5.1) r jn+i = 0j+i(A) r jn+i 

where i = 1, 2, • • ■ , n and j = 0, 1, 2, • • • . 

Starting with k = 1, let us call every n consecutive ^-iterations a "cycle", namely, iterations 
k = 1, 2, • • • , n form the first cycle, iterations = n + 1, n + 2, • • • ,n + n the second cycle and so 
on. Then (|5.ip increases the degree of the polynomial (ft by 1 at the beginning of every cycle. For 
example, consider n = 3. Then (15. ip implies that 

ri = </>i(A)ri, r 4 = c/> 2 (A)r 4 , r 7 = </> 3 (A)r 7 , 
r 2 = <£i(A)r 25 r 5 = 2 (A)r 5 , r 8 = </> 3 (A)r 8 , 
r 3 = </>i(A)r 3 , r 6 = </> 2 (A)r 6 , r 9 = </> 3 (A)r 9 . 

Iteration = 4 is the first iteration of the second cycle and the degree of (ft is increased from 1 to 
2 there. 

One can define r k by increasing the degree of (ft by one anywhere within a cycle. Correspond- 
ingly, (we believe) the definition will lead to a different algorithm of ML(n)BiCGStab. As an 
illustration, let us increase the degree of (ft at the end of every cycle and derive the algorithm 
associated with it. 

5.1. Notation and Definitions. Let (ft k (X) be defined as in (jl.ip . For k € Af, define 

( 52 ) r k = (ft gn{k+1) (A)? k , gfc = 0g n (fc+i)(A)g fc , 

Ufc = (ft 9n ( k )(A)r k , w k = Ag fe . 

and set 

(5.3) r = r and go = go- 

The vector r k is considered to be the residual of the approximate solution x k computed. We 
remark that = u k when r n {k) < n since g n (k + 1) = g n (k) in this case. 

Definition (I5.2p increases the degree of (ft at the end of a cycle. To see this, let n = 3. Then 
(|FT2l) yields 

ri = o (A)ri, r 4 = </>i(A)r 4 , r 7 = </> 2 (A)r 7 , 
r 2 = </> (A)r 2 , r 5 = ^i(A)r 5 , r 8 = </> 2 (A)r 8 , 
r 3 = 0i(A)r 3 , r 6 = </> 2 (A)r 6 , r 9 = (ft 3 (A)r 9 . 

5.2. Algorithm Derivation. To derive the algorithm associated with (|5.2p . we first transform 
Algorithm 13. II (forgetting Lines 1, 2, 5 and 11) into the following version which is computationally 
equivalent to Algorithm 13.11 but is more convenient for us to apply Proposition 12.11 



Derivation Stage #5. 



ML(n)BICGSTAB: REFORMULATION, ANALYSIS AND IMPLEMENTATION 19 
1. For k = 1, 2, • • • , until convergence: 



2. a k = pf r fc _i/pf Agfc_i; 

3. Ifr»(Jfe)<n 

4. r fc = r fe _i - a fc Agfc_i; 

5. For s = max(/c — n, 0), g n (k)n — 1 

6- # = -Pf +1 A (r, + E f S =L x(fc -n,o) ti k) %t) /pf + iAg s ; 

7. End 

8. For s = g n (k)n, ■ ■ ■ , k — 1 

9- # = -P?+iA (?* + Zttttn,*) ti k) & + 4^) /Pf+l Ag- 

io. End 

ii 9, _c, , v^fln(fc)n-l , y^fc-l 0W3 . 

u ' & fc ~~ 1 fc ^ Z^ s=max (fc-n,0) l Js & s ^ ^s=g„(k)n ^ & s ' 

12. Else 

13. r fc = r fe _i - afcAg fc _i; 

14. For s = g n (k)n, ■ ■ ■ , k — 1 

15. rf fc) = -Pf +1 A (r, + Zt=l (k)n A W ») /Pf +1 Ag s ; 

16. End 

17. g fc = r fe + Esi(fc)n & (fc) g«; 

18. End 



19. End 

Then we transform DS#5 as follows by Corollary 13.11 
Derivation Stage #6. 

1. For k = 1, 2, • • • , until convergence: 

2. a fe = q^ (jfc) gnW (A)r fe _ 1 /qJ (fe) A0 ffn(fe) (A)g fe _ 1 ; 

3. Ifr B (jfc)<n 

4- gii(fc) (A)r fc = gij(fc) (A)r fc _i - a k A<f> gn ( k )(A)g k -i] 

5. For s = m&x(k — n, 0), • • • , g n (k)n — 1 

6 - ^ = -<(S + 1) (^n(s + l) + l(A)?fc 

+ Pg„( S +l)+l E?=Lx(fc-n,0) ^t fc)A ^(*+l)( A )§t) /P 9 „(s+l)+iq^( s+ i)A^ n(s+ i)(A)g s ; 

7. End 

8. For s = g n (k)n, ■ ■ ■ , k — 1 

9- ^ = -<( S+ 1) A (<Wl)( A )^ + ^^-,0) tf°^(. + i)(A)& 

+ E S t=l n (k)nl 3 i k)( l ) g n (s+i)( A )St) /q^ (s+ i)A^ n(s+1) (A)g s ; 

10. End 

1L s „(fc+i)(A)g fc = a „( fc+ i)(A)r fe + (p gn{k+1) A + I) Ef=mi(& 1 - n> 0) ^Vg„(ifc+i)-i(A)g s + 

J2 h s =l n (k)n ^ k) 4> gn (k+i){A)g s ; 

12. Else 

13 - 4> 9n (k)(.A)r k = gn(fc) (A)r fc „! - Q fc A^ 9n(fc) (A)g fc _ 1 ; 

14 - ^(fc+i)( A )r* = (P 9 „(fc+i)A + l)(l>g n ( k+ i)-i(A)r k ; 
15. For s = g n (k)n, ■ ■ ■ , k — 1 

16 - = -<( s +l) (^M( s +l)+i(A)r fe + 
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Pg„( S +l)+l Et=g„(fc)n^? )A ^n(s+l)( A )gt) /P9n( S +i)+iq^( S +i) A ^(^+i)( A )^; 

17. End 

18. 9n(fc+1) (A)g fc = cf)g n{k+1) (A)? k + (p Sri{jfc+ i)A + I)E^ ff 1 n (A ; )n/ 3 ^V« ?Il (fc+i)-i( A )g s ; 

19. End 

20. End 

Lines 4, 11, 13 and 18, DS#6, were obtained from Lines 4, 11, 13 and 17, DS#5, by multiplying 
them with (f) gn ^(A) and 0w fc+1 )(A) respectively. Line 14, DS#6, is a direct result of the 
definition (jl.ip of 0. 

Now we use Proposition 12. II to write DS#6 as 

Derivation Stage #7. 

1. For A; = 1, 2, • • • , until convergence: 

2. a fe = q^ (jfc) gnW (A)r fe _ 1 /qJ (fe) A0 ffn(fe) (A)g fe _ 1 ; 

3. If r n (k)<n 

4- </><?„ (fc+i)( A )r& = 9n(fc) (A)r fc _i - a fc A<£ 9n ( & )(A)g fe _i; 

5. For s = max(/c — n, 0), • • • , g n (k)n — 1 

6 - = -<(.+!) (^n(fc+i)( A )rfc 

+ p 9n (fe+i) E* =max(fc— n,0) / 3 f )A< /'< ? n(t+l)( A )gt) /Pff „(fc+i)q^ i ( ;s+ i) A( / > g„(s+i)( A )gs; 

7. End 

8. For s = g n (k)n, ■ ■ ■ , k — 1 

9- # = -< (s+ i) A (A)? fc + EtttlUo) /f V(m)+i( A )gt 

)(A)a) /q^ (s+1) A0 9n(s+1) (A)g s ; 

10. End 

11. S „(fc+l)( A )gfc = S „(fc+l)( A )rfc + (P s „(jfc+1)A + I) E!=m a x(fc-n,0) /^^(s+l) ( A )gs + 

^s=g n (k)n /3rV S7l(s+ i)( A )g s ; 

12. Else 

13. 5n(fc )(A)r fc = 5n(fe) (A)r fe _ 1 - aj fc A0 ffn ( fc )(A)g fc _ 1 ; 

14- ^(fc+l)( A )r* = (P 9 „(fc+i) A + I )0 9 „(A ; )( A )rfc; 

15. For s = g n (k)n, ■ ■ ■ , k — 1 

16- = -<(s+i) (^„(fc+i)( A )rfc+ 

A?„(fc+1) Et S = 9 1 n (fc)n/ 3 f )A ^(t+l)( A )gt) /^(Hll^^ljHr.KljWfc; 

17. End 

18. 9n (fc+i)( A )gfc = 9n (fc+i)( A )?fc + (p ffn (fc+i) A + 1) E^ n ( fc )n /3^V Sn (,+i)( A )g s ; 

19. End 

20. End 

We remark that the term 9 „(t+i)+i(A)gf in the first sum on Line 9 can be further written as 

( 54 ) </V(t+i)+i( A )gt = (/? 9n (t+i)+i A + I)<^ n (t+i)( A )gt 

= (Pg„(fc+l) A + I )^(m)( A )gi- 



Substituting (15.4p and (15.2p into DS#7 then yields a set of updating relations of the vectors 
defined by (IP]) . 
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Derivation Stage #8. 

1. For k = 1, 2, • • • , until convergence: 



2. ai = q ftlW r*-i/< w Wi-ii 

3. Kr»(Jb)<n 

4. r k = r fe _i - a fc Wfe_i; 

5. For s = max(k — n, 0), g n (k)n — 1 

« _ H ( i v-^s— l \ / 

b - - _( lr„(s+l) ^ + Pg n (k+1) Z^t=max(fe-n,0) ^ w tJ / Pg„(fc+l)qr n (s+l) Ws > 

7. End 

8. For s = g n (k)n, ■ ■ ■ ,k — l 

9- ^ = -< (s+ i) ( Ar * + /fW+D A + IK 

+ E s tZl {k) n /3j fc) w t ) / qf n(s+1) w s ; 

10. End 

11. gfc — r fc + P 9n (fc+1) 2^s=max(fc-n,0) ^ s Ws + ^ s=max (fc-n,0) Ps gs + l^s=g n (k)n P s 5s, 

12. Else 

13. u fc = r fc _i - ajfcWfc-i; 

14. r fc = (p 9n(fc+1) A + I)u fc ; 

15. For s = g n (k)n, ■ ■ ■ , k — 1 

16 - = -q^( S +i) ( r fc + ^(^+1) ES,(*)n & (fc)w *) / p 9 „(fc+i)q^ (s +i) w s ; 

17. End 

18. gfc = r fc + P 9n (A;+l) Es=9„(i)« ^ W s + Es=g n (fc)n Z 3 ^ Ss! 

19. End 



20. End 



DS#8 does not contain any update about w&. For the updates, we multiply the equations on 
Lines 11 and 18 by A to get 



w — A f r I n ^g n (k)n-l a( k )„ T \ I v^9n(fc)n-l a( k )„ T 

/c k\ W fc - A l r fc + Pffn(fc+1) Z^ s=m ax(fe-n,0) P s Ws J + 2^ s=ma x(fe-n,0) P s Ws 

+ Lus=g n (k)nP s Ws 



fl (fc) 

if r n (/c) < n, and 

( 5 - 6 ) w fc = A ( r fc + P 9n (k+1) £^ n (fc)„ # W s ) + E^ n (fc)n 0- ^ 

if r n (fe) = n. 

Again, we consider to be a residual. To be consistent with Lines 4, 13 and 14, we update 
the solution vector as 



(5.7) x fc 



xa,-i + ajfcgfc-ij if »"„(¥) < n 

-p ff „(fc+i)Ufe + + ajfcgfc-i, if r n (k) = n. 

Now adding (15. 5|) . (15.61) and (15.7)) to DS#8 and simplifying the operations appropriately, we then 
arrive at the following algorithm. The free parameter p gn (k+i) * s chosen to minimize the 2-norm 
of r fc . 



Algorithm 5.1. ML(n)BiCGStab without preconditioning associated with definition 

mi 
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1. Choose an initial guess xo and n vectors qi,q2, • • • , q„. 

2. Compute yq = b — Axo and go = ro, wo = Ago, cq = q^wo- 

3. For k = 1, 2, • • • , until convergence: 



4- ctk = % n {k) r k-\/ck-i; 

5. If r n (k) < n 

6. Xfc = Xfe_i + ajfegfe_i; 

7. r k = rfc_i - a fc w fc _i; 
5. z w = r k , g fc = 0; 

5. For s = max(/c — n, 0), • • • , g n {k)n — 1 

Ps k) = -q?, s+l) z w /c s ; % = Pi k) p gn{k+ i) 

~(k) 

11. z w = z w + f3s 'w s ; 

12. Kk = Sk + pi k) E*; 

13. End 

1 

14. g k =z w -\ g fc ; 

15. Wfc = Agfc/ 

75. For s = g n (k)n, ■ ■ ■ , k — 1 

i7. = -< (s+1) w fe /c s ; 

15. Wfc = Wfc + /3i fc) w s ; 

19. gfc = gfc + #g s ; 

SO. End 

21. E/se 

£2. x fc = Xfc_i + afcgfc_i; 

25. Ufc = rjfc_i - a fe w fc _i; 

^- P S „(fe+l) = -(Aufc)^Ufc/||Aufc|||; 

25. x fe = Xfc - /9 9n (fc +1 )Ufc; 

2<J. r k = p 9n ( k+ i) Aufc + u fc ; 

27. z w = r fc , gfc = 0; 

28. For s = g n (k)n,- ■ ■ ,k - 1 

29. Pf ] = - q « n{s+1) z w / Cs ; % ^ = ti k) Pgn{k+l) 

~(k) 

30. z w = z w + f3s 'w s ; 

31. Sk = gk + pi k) s s ; 

32. End 

33. gfc = z w H gfc/ 

Pg„(fc+i) 

34. Wfc = Ag fc ; 
55. End 

36. Cfc = q^ (fe+1) Wfc; 



57. End 

We remark that (i) the algorithm does not compute Ufc when r n (k) < n. In fact, u k = r k when 
r n {k) < n (see the remark right after (|5.2p ); (ii) if the Ufc on Line 23 happens to be zero, then the 
Xfc on Line 22 will be the exact solution to system (13, lj) and the algorithm stops there. 

The cost and storage requirement, obtained from its preconditioned version, Algorithm 19.21 in 
are listed in Table I5TT1 Compared to Algorithm 14.11 Algorithm 15.11 saves about 20% in saxpy. 
Since only three sets of vectors {qi, . . . , q n }, {gfc- n > • • • , gfc-i} and {wfc_ n , . . . , Wfc_i} are needed 
in iteration k, the storage is about 3nN besides storing A and M. 
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Table 5.1. Average cost per (k-) iteration step of Algorithm [9^21 and its storage requirement. 



Preconditioning (M x v) 


1+i 


Vector addition (u ± v) 


1 


Matvec (Av) 




Saxpy (u + av) 


2 

2n + 2 + - 

n 


dot product (u H v) 


7^ 

n+ 1 + - 

n 


Storage 


A + M + (3n + 5)N + 0(n) 



5.3. Properties. We summarize the properties about Algorithm 15.11 below. Their proofs are 
similar to those in Proposition 14.11 Since ro = ro by (15. 3p . v is also the degree of the minimal 
polynomial of tq with respect to A. 

Proposition 5.1. Under the assumptions of Proposition HOI if p g tk+l) an d ~ 1/Pg n (fc+1) ^ 
cr(A) for 1 < k < v— 1, f/ien Alaorithm \5.1\ does not break down by zero division for k = 1, 2, • • • , i> ; 
and f/ie approximate solution x u at step k = v is exact to the system \3.1\) . Moreover, the computed 
quantities satisfy 

(a) Xfc G XQ+span{rQ, Aro, . . . , A Sn ( fc+1 ) +fe-1 ro} and r k = b— Ax^. G ro+spem{Aro, A 2 ro, . . . , A 9n ( fc+1 ) +fc ro 
for 1 < fe < v — 1. 

(b) r fe ^ /or 1 < A; < v - 1/ r v = 0. 

(c) r fc _L span{q%, q 2 , • • • , q r „(fc)} r fc / qr n (fc)+i /or 1 < A; < ^ - 1 wit/i r n (fc) < n; r fc / qi 
for 1 < k < v — 1 mt/t r n (/c) = n. 

(d) Ufc _L span{qi, q2, • • • , q n } for 1 <k <u with r n (k) = n. 

(e) Ag fc _L span{qi,q 2 , • • • ,q r „(fc)} Ag fc / q rn (fc)+i for 1 < k < v - 1 r n (fc) < n; 
Agfe qi /or 1 < k < v — 1 with r n (k) = n. 



6. Relations to Some Other Methods 

In this section, we discuss the relations of ML(n)BiCGStab with the FOM, BiCGStab and 
IDR(n) methods under the exact arithmetic environment. 

6.1. Algorithm 14.11 

(1) Relation with FOM [22]. Consider the case where n > v. In this case, g n {k) = and 
f n {k) = k for k = 1, 2, • • • , v . Hence = q& by (|3.2p . If we choose q^ = r^_i in 
Algorithm 13.11 (it is possible since ffc-i is computed before q^ is used), then the and 
rfc computed by the algorithm satisfy 

/ 6 1 n f x fc G x + span{r 0l Ar , . . . , A fc-1 r }, 

\r k ± apan{r ,ri, . . . ,r fc -i} 

for 1 < k < v by Proposition 13.1( a). (d). (|6,ip is what the FOM approximate solution 
x f , ° A/ needs to satisfy. Therefore, when n > v and with the choice q& = ?fc_i, Algorithm 
13. H is mathematically equivalent to FOM. 

Now, from (|4,3p . the r^ computed by Algorithm 14. II satisfies 
r k = ^ 3 „(fc)+i(A) r k = <f>x(A)r k = (pi A + I) r k . 
Note that u^. = 4> gn ^(A)ri t = (^o(A)r^ = r^. Thus, for 1 < k < v, r k is the factor pi A + I 
times the FOM residual u k if we set qi = ro and q/c+i = u& in Algorithm 14. II 6 



In [39], a remark immediately following Theorem 4.1 states that, when n > v and with the choice that qi = 
0i(A H )<^i(A)ro and = 0i(A H )rA;_i for k > 2, the and computed by Algorithm 2 (which is mathematically 
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(2) Relation with BiCGStab |32j. When n = 1, we have g n (k) = k — 1 and r n (k) = 1 for 

k £ J\f. Hence = (A^) fe 1 qi by (|3,2|) . By Proposition 13.1( a) and (d), the and 
computed by Algorithm 13.11 satisfy 

^ ^ | x fe G So + span{r , Ar , . . . , A*" 1 ^} 

I r k _L span{qi, A H q 1 , . . . , (A H ) -1 qi} 

for 1 < k < v. (|6.2p is what the BiCG approximate solution xjP* needs to satisfy. 
Therefore, when n = 1, Algorithm 13.11 is mathematically equivalent to BiCG. 

Now, from (|4.3p . the computed by Algorithm 14. II satisfies 

r k = ( t ) g n (k)+i{A)r k = 4>k{A) r k 

which is the definition of the BiCGStab residuals. Thus Algorithm 14.11 is mathematically 
equivalent to BiCGStab when n = 1. 

(3) Relation with IDR(n) [31]. Write k = jn + i as in (JX]} with 1 < i < n,0 < j. Let 

Qo = K(A, tq) be the complete Krylov space and let S = span{qi,q2, • • • ,q n } ± - Define 
the Sonneveld spaces 

g j+ i = ( Pj+ iA + i)(gj ns) = ( Pgn{k)+1 A + 1)(^- n s) 

for j = 0, 1, 2, • • • . By g3D, we have 

Yjn+i = <Aj+i(A)rj n+i = (pj+\A + I)<t)j(A)r jn+i = (pj+iA + I)u jn+i . 

From Proposition 14.1( d). Uj n+i JL q i+1 if i < n. Hence uj n+i Qj n 5 and therefore 
r jn+i ^ ^j+i when i < n. From this point of view, Algorithm 14.11 is not a IDR(n) 
algorithm. 

6.2. Algorithm 15.11 

(1) Relation with FOM . When n > v, g n {k) = and r n (k) = k for 1 < k < v and Algorithm 
13.11 with the choice q^ = v k -\, is a FOM algorithm as seen in £ )6.11 Now, from (|5.2p . the 
r k computed by Algorithm 15.11 satisfies 

rfc = 4> gn (k+i)(A)r k = 4> (A)r k =r k . 

Thus Algorithm 15.11 is a FOM algorithm when we set = r k -i- 

(2) Relation with BiCGStab . When n = 1, we have g n {k) = k — 1 and r n (fc) = 1 for k E A/" 
and Algorithm 13. II is a BiCG algorithm. Now, from (I5.2p . the r k computed by Algorithm 
15.11 satisfies 

rfc = <P 9n (k+i) (A)% = <t) k {A)r k 
which is the definition of the BiCGStab residuals. Thus Algorithm 15.11 is mathematically 
equivalent to BiCGStab. 

(3) Relation with IDR(n). Write k = jn + i as in (J2H]) with 1 < i < n,0 < j. By (jO) . we 
have 

, . , , A ^ f 0j(A)r, n+ j if 1 < i < n, 

r jn+i - 9nOn+i+1) (A)r jn+i - j ^. +i(A) ~ n+ . if • = n 



equivalent to Algorithm 14. 1 1 of this paper) will satisfies (|6.I[) and therefore Algorithm 2 is a FOM. The argument 
there about this remark is not correct. The author remembers that the referees of [39] were skeptical about the 
argument. 
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By (|3.2p and Proposition 13. If d). we have 



J (j)t(A)rj n+ i € S if 1 < i < n and < t < j, 
\ t (A)r in+ri e S if < t < j. 



Thus, by induction on t, we have 



j (pt(A)rj n+i eQ t C\S if 1 < i < n and < t < j, 
{ ^(A)r in+n eg t n5 ifO<t<j. 

Therefore, by (|6.3p . 




[ rjn+n = (Pi+iA + T)4>j(A)r jn+ri G C/j +1 . 

So, the residuals in (|6.3p he in the Sonneveld spaces <7 and therefore Algorithm 15.11 is a 
IDR(n) algortithm. 



A preconditioned ML(n)BiCGStab algorithm can be obtained by applying either Algorithm 
14.11 or Algorithm 15.11 to the system 



where M is nonsingular, then recovering x through x = M _1 y. The resulting algorithms, Al- 
gorithm 19.11 and Algorithm 19. 2\ together with their Matlab codes are presented in §9.11 and §9.21 
respectively. To avoid calling the index functions r n {k) and g n (k) every ^-iteration, we have split 
the fc-loop into a i-loop and a j-loop where i, j, k are related by (12.lt) with 1 < i < n, < j. 
Moreover, we have optimized the operations as possible as we can in the resulting preconditioned 
algorithms. 

Since we have compared ML(n)BiCGStab with some existing methods in [39], we will only con- 
centrate on the performance of ML(n)BiCGStab itself. The following test data were downloaded 
from Matrix Market. 7 

(1) e20r0100, DRIVCAV Fluid Dynamics. e20r0100 contains a 4241 x 4241 real unsymmetric 
matrix A with 131, 556 nonzero entries and a real right-hand side b. 

(2) qc2534, H2PLUS Quantum Chemistry, NEP Collection. qc2534 contains a 2534 x 2534 
complex symmetric indefinite matrix with 463, 360 nonzero entries, but does not provide 
the right-hand side b. Following [25], we set b = Al with 1 = [1, , 1, • • • , 1] T . 

(3) utm5940, TOKAMAK Nuclear Physics (Plasmas). utm5940 contains a 5940 x 5940 real 
unsymmetric matrix A with 83, 842 nonzero entries and a real right-hand side b. 

Experiments were performed in Matlab Version 7.1 on a Windows XP machine with a Pentium 
4 processor. ILU(0) preconditioner (p. 294, [21]) has been used in all the experiments. For 
e20r0100, the [/-factor of the ILU(0) decomposition of A has some zeros along its main diagonal. 
In that experiment, we replaced those zeros with 1 so that the [/-factor was invertible. 

In all the experiments, initial guess xq = and stopping criterion is 



where is the computed residual. Except where specified, auxiliary vectors Q = [qi, q2, ■ • • , q n ] 
are chosen to be Q = [ro, randn(N, n — 1)] for e20r0100 and utm5940 and Q = [ro, randn(N, n — 
1) + sqrt(-l) * randn(N, n - 1)] for qc2534. 



7. Implementation Issues 



AM y = b 



rfc||2/||b|| 2 < 10 
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Moreover, for the convenience of our presentation, we introduce the following functions: 

(a) T conv (n) is the time that a ML(n)BiCGStab algorithm takes to converge. 

(b) Iconvin) is the number of iterations that a ML(n)BiCGStab algorithm takes to converge. 

(c) E(n) := ||b — A_x|| 2 / 1| b H2 is the true relative error of x where x is the computed solution 
output by a ML(n)BiCGStab algorithm when it converges. 

7.1. Stability. We plot the graphs of I CO nv(n) in Figures O^a), E^a) andE|a). For e20r0100 
and qc2534, Iconv{n) decreases as n increases. However, the Iconv{n) for utm5940 behaves very 
irregularly due to some of the p's are too small. Recall that ML(n)BiCGStab performs 1 + 1/n 
matrix-vector multiplications (MVs) per iteration on average. In terms of the number of MVs, 
both Algorithms 19.11 and 19.21 are considerably faster than BiCGStab in all the three experiments. 
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Figure 7.1. e20r0100: (a) Graphs of I CO nv(n) against n. BiCGStab took 2620 
iterations/5240 MVs to converge. Full GMRES converged with 308 MVs. (b) 
Graphs of E{n) against n. 
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Figure 7.2. qc2534: (a) Graphs of I CO nv(n) against n. BiCGStab took 329 
iterations/658 MVs to converge. Full GMRES converged with 439 MVs. (b) 
Graphs of E(n) against n. 
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Figure 7.3. utm5940: (a) Graphs of I CO nv(n) against n. BiCGStab took 228 
iterations/455 MVs to converge. Full GMRES converged with 176 MVs. (b) 
Graphs of E(n) against n. 



The graphs of E(n) are plotted in Figures 17.1( b). 17. 21( b) and 17.3( b). It can be seen that 
the computed in Algorithm 19.21 easily diverges from its exact counterpart b — Ax&. This 
divergence becomes significant when n > 15 for e20r0100 and n > 4 for utm5940. By contrast, 
the computed relative errors ||rfc||2/||b||2 by Algorithm 19.11 well approximate their corresponding 
true ones. Thus, from this point of view, we consider that Algorithm 19.11 is numerically more 
stable than Algorithm 19.21 However, Algorithm 19.21 taken twice to form a predictor-corrector pair 
can be an efficient and stable algorithm. We remark that the issues of divergence of computed 
residuals and corresponding remedy techniques have been discussed in detail in [T81I2TII34"] ■ 

7.2. Choice of n. In this and the following subsections, we will focus on Algorithm 19.11 

From the experiments in [39j and this paper, we have observed that ML(n)BiCGStab behaves 
more and more robust as n is increased. So, for an ill-conditioned problem, we would tend to 
suggest a large n for ML(n)BiCGStab. 

ML(n)BiCGStab minimizes 1 1 r ^ H2 once every n iterations. The convergence of a well-conditioned 
problem is usually accelerated by the minimization steps. So, when a problem is well-conditioned, 
we would suggest a small n, say, n < 3. n = 2 may be a good choice since it reduces the MV cost 
by 25% per iteration while keeping the minimization performed with a high frequency. 

We also plot the graphs of T conv (n) in Figures PTT41 and [7T5l a) to provide more information on 
how n affects the performance of ML(n)BiCGStab. 

7.3. Choice of p. The standard choice for the Pj+i in Algorithm 19. II is 

(7.1) Pj+i = -(Au jn+1 ) H u jn+1 /\\Au jn+1 \\l. 

This choice of Pj+i minimizes the 2-norm of Tj n +i = Pj+l A5j n+ i + Uj n+ i, but sometimes can 
cause instability due to that it can be very small during an implementation. A remedy as follows 
has been suggested in [26] : 

Pj+l = — {Auj n+ i) H Uj n+ i/\\ Auj n+ i||2; 

(7.2) uj = (Au jn+1 ) H u jn+ i/(\\Au jn+1 \\ 2 \\u jn+ i\\ 2 ); 

if < k, pj + i = Kpj+i/\u)\) end 

where k is a user-defined parameter. In Figures 17. 5( b) and !7.6( a). we compare the performances of 
Algorithm 19.11 with (|7.1h and (17. 2\i respectively (we only plot the results of qc2534 and utm5940. 
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Figure 7.4. (a) e20r0100: Graph of T conv {n) of Algorithm I9TT1 against n. T conv (n) 
reaches its minimum at n = 22. (b) qc2534: Graph of T conv (n) of Algorithm I9.ll 
against n. T conv (n) reaches its minimum at n — 8. 




Figure 7.5. (a) utm5940: Graph of T conv (n) of Algorithm [9J] against n. T conv (n) 
reaches its minimum at n = 6. (b) qc2534: Graphs of I C onv( n ) of Algorithm I9.ll 
against n with choices (I7.ip and (I7.2p for p respectively. In this experiment, we 
picked k = 0.7. 



The result of e20r0100 with k = 0.1 is analogous to Figure [TToT b)). Also, see the numerical 
experiments in [31] for more information about these p choices. 

7.4. Choice of q's. We usually pick Q = [qi,q2, • • • , q n ] a s 

(7.3) Q = [r , randn(N, n - 1)] 
for a real problem and 

(7.4) Q = [rrj, randn{N, n — 1) + sqrt(-l) * randn(N, n — 1)] 

for a complex problem. In our experiments, however, we observed a comparable performance 
when we chose 

(7.5) Q = [rrj, sign(randn(N, n — 1))]. 
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Figure 7.6. utm5940: (a) Graphs of I CO nv(n) of Algorithm I9.ll against n with 
choices ()7. 1|) and (|7.2p for p respectively. In this experiment, we picked k = 0.7. 
(b) Graphs of Iconv (n) of Algorithm I9.ll against n with choices (|T.3j) and (|7.5p for 
Q respectively. 



or 

(7.6) Q = [rrj, sign{randn{N ', n — 1)) + sqrt[— 1) * sign(randn(N, n — 1))]. 

See Figure ITTBT b) (we only plot the result of utm5940 for saving space). 

The advantages of (|7.5p and (|7.6p over (|7.3j) and (|7.4p are that (i) the storage of Q is substan- 
tially reduced. In fact, we just need to store the random signs (except its first column); (ii) an 
inner product with q«, 2 < i < n, is now reduced to a sum without involving scalar multiplications. 

For other choices for Q, one is referred to |31j . 

8. Concluding Remarks 

With the help of index functions, we re-derived the ML(re)BiCGStab algorithm in [39] in a more 
systematic way. This time, we have been able to find out and remove some redundant operations 
so that the algorithm becomes more efficient. We also realized that there are n ways to define 
the ML(n)BiCGStab residuals r^. Each of the definitions will lead to a different algorithm. We 
presented two definitions together with their associated algorithms, namely, (i) definition (|4.3p . 
increasing the degree of (ft at the beginning of an iteration cycle, and the associated Algorithm 14. 11 
(ii) definition (|5,2p . increasing the degree of (ft at the end of an iteration cycle, and the associated 
Algorithm 15.11 By comparison, Algorithm 15.11 is cheaper in storage and in computational cost, 
faster to converge, but less stable. For other definitions of that increase the degree of (ft 
somewhere within a cycle, we expect that the associated algorithms would lie between Algorithms 
14.11 and 15.11 in terms of computational cost, storage and performance. 

We proved that the Lanczos-based BiCG/BiCGStab and the Arnoldi-based FOM are essentially 
methods of the same type. Both are the extreme cases of ML(n)BiCG /ML(n)BiCGStab. 

In this paper, we did not assume that A is a nonsingular matrix. When a singular system (|3.ip 
is solved, selecting an appropriate initial guess xo is a crucial step. If xrj is selected such that the 
affine space xo + span{A t ro\t G Mo} contains a solution to the system (|3.ip 8 . ML(n)BiCG will 
almost surely converge (see Corollary 13.21) . Otherwise, W6 shall have Pmin 

(0, A,r ) = (see the 

remark before Corollary I3.2p which yields det(S;,) = (see the remark after Proposition 13. ip . In 



^For an example where the affine space contains no solution, consider A = [0, 1; 0, 0],b = [1,0] T and select 
xo = [1,0] T . Note that this linear system is consistent. 
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this case, YYi=i det(Sj) = and therefore there is no guarantee that the LCT-factorizations in the 
construction of ML(n)BiCG exist (see the remark after Proposition 13, ip . As a result, it is likely 
that ||rfc||2 blows up to oo. A similar remark also applies to ML(n)BiCGStab. 

9. Appendix 

In this section, we present the preconditioned ML(n)BiCGStab algorithms together with their 
Matlab codes. 

9.1. ML(n)BiCGStab with Definition (I4.3D . The following algorithm is a preconditioned 
version of Algorithm 14.11 

Algorithm 9.1. ML(n)BiCGStab with preconditioning associated with (|4.3|) . 

1. Choose an initial guess xo and n vectors qi,q2, ■ ■ ■ , qn- 

2. Compute tq = b — Axo and set go = Tq. 

Compute g = M _1 g , w = Ag , c = qf w and e = qf r . 

3. For j = 0,1,2,- •• 



4- a jn+l = — l)n+n/ c (j — l)n+n> 

5- Ujn+1 = Y(j-l)n+n ~ Ctjn+\ W(j_i) n+n ; 

6. x jn+l = x (j — l)n+n a jn+lS(J— l)n+n; 

7. Uj n+ i = M~ 1 U J ' n+ i; 

8. Pj+i = —(Auj n+ i) H \ij n+ i/\\Auj n+ i\\2; 

9- Xjn+l = x jn+l — Pj+l u jn+l,' 

10. Yj n+ \ = pj + \A\ij n+ i + Uj n +i/ 

11. Fori = 1,2, ••• ,n- 1 

fjn+i = Qj+l'fjn+i/ 

/3(j_i) n+i = — fjn+i/ C(j-l)n+i> 
15. Ifi<n-2 

ia — i a(i n +*) 

Zd - u jn+i + P (i „i) n+i a (i _i) n+i ; 

J gjn+i — P(j_i) n+i g(j-l)n+i> 

I*- *w - P^-iyn+i^U-^n+i; 

in ati n +i) _ „H „ /„ 

Jy - Py_i) n+i+ i - qi+2 Z <i/ C (j-l)n+i+l' 

SO. For s = i + 1, • • • ,n - 2 

2i- z d = z d + /3g!^ +s d _ 1)n+s ; 

Sjn+i = gjn+i + ^_i) n+s g(j-l)n+s/ 
23- 2 W =Z W + P{j^l +s W(j-l) n + a ; 

H- /3j_ 1)n+s+1 = -q s+2 z d/ c 0'-i)n+s+i; 

Sjn+i = gjn+i + /3(j_ 1 ) n+n _ 1 g(j-l)n+n-l / 

Zf " = Z <" + /5j_i) rl+n _ 1 w 0'-l)n+n-i; 
So. = rj n _(_j + Pj+\Z W J 

29. Else 

<r,n o{j n +i) 

JU - gjn+i — Py_x) n+n _ig(3-l)n+n-l; 
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JI - Z "> - r jn+i + Pj+lP(j-l)n+n-l W (j-l)n+n-i; 

32. End 

oo o(jn+i) _ u„ w o(jn+i) _ o(jn+i) 

JJ - P(j-i) n+n - Ql z w/ C(j-l)n+n, '0 P (j _ 1)n+n - P3 + lP{j-l) n +n 

q I , a(jn+i) 

'H- z w -z w + p (j _ 1)n+n W( J _ 1)n+n ; 

35- gjn+i = gjn+i + z w + 0^i^ n+n / Pj+l)g(j-l) n+n ; 

36. Else 

on a(jn+i) _ h_ / . <» o(jn+i) _ R {jn+i) 

J/ - PQ_i) n+n - ~Ql r jn+i/ C 0'-l)n+n; /o P(j_l) n+n ~ 

J<? - z w — r jn+i ~r P( J _ 1 ) n+n W( J _ 1 ) n+n; 

55. gjn+i = z w + (^'"t)n+Ti/^'+ 1 )S(j-l)n+Ti/ 

^0. .End 

^i. For s = 1, ■ ■ ■ ,i — 1 

4%- fijn+s ^ = ~ < lf+l z w/cjn+ s ; 

4 J- Sjn+i — gjn+i + Pjn+s gjn+si 

44- z w = z w + Pj n + S dj n + s ; 

45. End 

46. Ifi<n-1 

4*7 ■ ^jn+i — z w Uj n _|_j 7 

4&- Cjn+i = Qi+id-jn+ii 

49. Ctjn+i+l — fjn+i/Cjn+ij % Ctj n +i+i — Oij n+ i + i / Pj+1 

50. Uj n +j_|_i = U^ n -|-j CXj n +i+idj n +i; 

51. Else 

52. c jn+i = Qj+l { z w ~ Ujn+i)> 

53. aj n+ i + i = fj n +i/cj n+ i; % ctj n+ i + i = a>j n+ i + i/pj + i 

54. End 

55. gjn+i = M 1 gj n +i; 

56. Wjn+i = A-gjn+i'j 

57. Xj n _|_j_)_i = Xj n _)_j + pj+io>j n +i+igjn+i! 

55. = r jn+i — Pj>l«j'n+j+l w .jn+j; 

55. -End 

60. ej n 4- n = Yj n +nj 

ci o(jn+n) _ 1 o(jn+n) _ o(jn+n) 

P {j-l)n+n — e jn+n/ t(j~l)n+n, /0 P (j ~l)n+n ~ l J 3 + l P (j -l)n+n 

62. z w = r jn+n + ^ J _ 1 ) n+n W( J _ 1 ) n+n ; 

63. g jn+n = Z W + {^p^ n \j Pj+l)g(j-l) n +n; 

64. Ifn>2 

oU n + n ) H I 

65. (3y n+1 = -c£z w /c jn+1 ; 

66. For s = 1, ■ ■ ■ , n — 2 

Sjn+n = gjn+n + flj n + s ^gjn+sj 
f.„ . o(jn+n), 

00. z w — z w + Pj n+S o.j n -\- s ; 

69. P^n+s+l = ~^+2 z w / Cjn+s+i; 

70. End 

r<j 1 o{jn+n) 

' J - gjn+n — gjn+n + P j n + n -igjn+n~l ! 

72. End 

73. gjn+n = M 1 gjn+n ; 
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75. 
76. End 



w, 



jn+n — -^Sjn+ri) 



Matlab code of Algorithm 19.11 

I. function [x, err, iter, flag] = mlbicgstab(A, x, b, Q, M, max-it, tol, kappa) 
2. 

3. % input: A: N-by-N matrix. M: N-by-N preconditioner matrix. 

4. % Q: N-by-n auxiliary matrix [qi,--- , q n ]. x: initial guess. 

5. % b: right hand side vector. maxJt: maximum number of iterations. 

6. % tol: error tolerance. 

7. % kappa: (real number) minimization step controller: 

8. % kappa = 0, standard minimization 

9. % kappa > 0, Sleijpen-van der Vorst minimization 

10. %output: x: solution computed, err: error norm, iter: number of iterations performed. 

II. % flag: = 0, solution found to tolerance 

12. % = 1, no convergence given max St iterations 

13. % = —1, breakdown. 

14. % storage: D: N x (n — 2) matrix defined only when n > 2. 

15. % G, Q, W: N x n matrices. A,M: N x N matrices. 

16. % x, r, g_t, u,z,b: N x 1 matrices, c: 1 x n matrix. 
17. 

18. iV = size(A, 2); n = size(Q, 2); 

19. G = zeros(N,n); W = zeros(N,n); % initialize workspace for d, g, w and c 

20. ifn>2, D = zeros(N,n - 2); end 

21. c = zerosiA, n); % end initialization 
22. 

23. iter = 0; flag = 1; bnrm2 = norm(b); 

24. if bnrm2 == 0.0, bnrml = 1.0; end 
25. 

26. r = b — A * x; err = norm(r)/bnrm2; 

27. if err < tol, flag = 0; return, end 
28. 

29. G(:, n) = r; g_t = M\r; W(:,n) = A* g_t; 

30. c(n) = Q(:,1)' * W(:,n); 

31. if c(n) == 0, //ag = —1; return, end 

32. e = Q(:,iy *r; 
33. 

34. for j = : maxJt 

35. alpha = e/c(n); 

36. x = x + alpha * gJt; 

37. u = r — alpha * W(:,n); 

38. err = norra(u)/bnrm2; 

39. if err < to/, //ag = 0; iter = iter + 1; return, end 
40. 

41. g_t = M\u; z = A* gJt; rho = z' * z; 

42. if rho == 0, flag = —1; return, end 

43. omega = z' * u; 

44. if omega == 0, /Zag = — 1; return, end 
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45. rho = —omega/rho; 

46. if kappa > 

47. omega = omega / {nor m{z) * norm(u)); 

48. absjom = abs(omega); 

49. if abs_om < kappa, rho = rho * kappa/ abs_om; end 

50. end 

51. x = x — rho * g_t; 

52. r = rho * z + u; 

53. err = norm{r)/bnrm2; 

54. iter = iter + 1; 

55. if err < tol, flag = 0; return, end 

56. if iter >= maxJt, return, end 
57. 

58. for i = 1 : n — 1 

59. / = Q(:,i + l)'*u; 

60. if j >= 1 

61. beta = —f/c{i); 

62. if i <= n - 2 

63. D(:,i) = u + beta * D{:,i); 

64. G(:,i) =6eto*G(:,i); 

65. W(:,i) = 6eta*VF(:,i); 

66. 6eta = -Q(:, i + 2)' * £>(:, i)/c(i + 1); 

67. for s = i + 1 : n — 2 

68. £>(:,») = D(:,i) + beta * D{:, s); 

69. G(:,i) = G(:,i) + 6eta*G(:,s); 

70. W(:,i) = + 6eta * W{:,s); 

71. 6eta = -Q(:, a + 2)' * D(:,i)/c(s + 1); 

72. end 

73. G{:,i) = G{:,i) + beta*G{:,n-l); 

74. W(:,i) = W{:,i) + beta*W{:,n-l); 

75. = r + r/io * 

76. else 

77. G(:,i) = beta*G{:,n- 1); 

78. «) = r + (r/io * beta) * W{:, n - 1); 

79. end 

80. beta = -Q(:,l)' *W(:,i)/c(n); 

81. W(:,i) = W{:,i) + beta* W{:,n); 

82. G(:, t) = G(:, i) + W{:, i) + {beta /rho) * G{:, n); 

83. else 

84. beta = -Q(:, 1)' * r/c(n); 

85. = r + 6eta* W(:,n); 

86. G(:, i) = W(:, i) + {beta/ rho) * G{:, n); 

87. end 

88. for s = 1 : i - 1 

89. 6eta=-Q(:,s + l)'*W(:,i)/c(s); 

90. G(:,i) = G(:,i) + 6eto*G(:,s); 

91. i) = W(:,i) + beta*D{:,s); 

92. end 

93. if i < n - 1 
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94. D(:,i) = W(:,i) -u; 

95. c(i) = Q{:,i + l)' *D{:,i); 

96. if c{i) == 0, flag = — 1; return, end 

97. alpha = f /c(i); 

98. u = u — alpha * D(:,i); 

99. else 

100. c(i) = Q(:,i + iy *(W(:, *)-«); 

101. if c(i) == 0, flag = —1; return, end 

102. alpha = f/c(i); 

103. end 

104. gjt = M\G(:,i); W(:,i) = A*g_t; 

105. alpha = rho * alpha; 

106. x = x + alpha * 

107. r = r — alpha *W(:,i); 

108. err = norm(r)/bnrm2; 

109. iier = iter + 1; 

110. if err < ioZ, flag = 0; return, end 

111. if iter >= max St, return, end 

112. end 

113. e = Q(:,l)'*r; beta = —e/c(n); 

114. W(:,n) = r + beta*W(:,n); 

115. G(:,n) = W(:,n) + (beta/rho) * G(:,n); 

116. ifn>=2 

117. beta = -Q(:,2) 7 * W(:,n)/c(l); 

118. for s = 1 : n - 2 

119. G(:,n) = G(:,n) + 6eta*G(:,s); 

120. W(:,n) = W(:,n) + beta *£>(:, s); 

121. beta = -Q(:, s + 2)' * W(:, n)/c(s + 1); 

122. end 

123. G(:, n) = G(:, n) + 6eia * G(:,n - 1); 

124. end 

125. gj = M\G(:,n); W(:,n) = A*gJ; 

126. c(n) = Q(:,l)' *W(:,n); 

127. if c(n) == 0, flag = —1; return, end 

128. end 

9.2. ML(n)BiCGStab with Definition (15. 2|) . The following algorithm is a preconditioned 
version of Algorithm 15.11 



Algorithm 9.2. ML(n)BiCGStab with preconditioning associated with (15 .2D . 

Choose an initial guess xo and n vectors qi,q2, • • • , qn- 

Compute r = b - Ax , go = M _1 ro, w = Ag , c = qf w and e = qf r . 
For j = 0,1,2,- ■■ 

For i = 1, 2, • • • , n — 1 

Otjn+i — &jn+i— 1 /Cj'ri+i— 1 ; 

Xjn+i = Xj n+ j_i + <X,- n+ jgj n+ j_i; % g = M _1 g 
Yjn+i — ^jn+i— 1 Qjn+iW jn+i—l; 
&jn+i = q-t+l r in+i; 
//J > 1 
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5{jn+i) _ I . <% aU n +i) _ _ o(jn+i) 

P (j-l)n+i ~ e jn+i/ L(j-l)n+i> '0 Py-l)n+i — PjP(j-i) n+i 

— _l_ R^ n+i ) 

Z » — r jn+i + P( J _ 1 ) n+ jW(j_i) n+ j, 

~ _ o(Jn+i) ~ 

gjn+i — P(j_i) n _|_jg(j-l)n+i; 

For s = i + 1, • • ■ ,n — 1 



o{jn+i) _ jj I qf o{jn+i) _ n(jn+i) 

P (j-l)n+s ~ <ls+l z W c (j-l)n+s> 7° P(,-_i) n+s ~ PjP(j- 

j(jn+i) 



z w - z w + Py_ 1)n+a wy_i) n+a ; 
~ _ ~ , 3(i n +*) ~ 

Kjn+i — gjn+i + P(j_i) n+s g(j-l)n+s> 

End 

gj n+ i = M _1 Z W H gjn+ii 

Pj 

Else 

gjn+i — -M- ^jn+i) 

End 

w jn+i = Agj n+ j/ 

For s = 0, • • • ,i — 1 

oO'n+i) _ h / 
Pj„ +S - -q s+ l W 3 n + ! / C Jn+ s ' 

_ i o0' n +») 

w jn+i — w jn+i + Pj n +s w jn+s; 

gjn+i — gj'n+i + Pjn+s Sjn+sj 

End 
End 

Q^'n+n — ^jra+ra— l/Cj'n+n— 1; 
■^jn+n — ^jn+n— 1 "I - ^jn+nSjn+n— 1> 
^■jn+n — ^jn+n— 1 ^jn+n^jri-l-ii-lj 
Uj'n+n — Uj'ra+ni 

Pj'+l = ~~ (Allj n + n ) Uj n _|_ n /|| AUj n _|_ n ||2,' 
X-jn+n — ^-jn+n Pj+l^-jn+n; 
Yjn+n — Pj+1 Allj n -|_ n + Uj n _|_ n; 

&jn+n — Ql ^jn+m 

o(jn+n) _ / q/ 3(in+n) _ oO'n+n) 

P(j-l)n+n ~~ e jn+n/ t(j-l)n+n, /0 P(j-l)n+n ~ f J 3+ 1 P(j-l)n+n 

— _i_ 50 n + n ) 

*w — rj n+n -t- P(j_i) fl+n w(3-l)n+fl!' 

~ oU n + n ) ~ 

gjn+n — P( J _ 1 ) n+n g(i-l)n+n; 

For s = 1, ■ • ■ , n — 1 

oO'n+n) _ H „ /„ G/ 50'™+™) _ _ o0»+n) 

Pjn+s ~~ Is+l^xu/ t-jn+s; /0 Pjn+s — Fj+l^jn+s 

z ui = z ui ~\~ Pjn^g w jn+s/ 

~ ~ _|_ n(jn+n)~ 

gjn+n — gj'n+n + Pj n + S &jn+s> 

End 

gjn+n — -M- ^"w H gjr'n+n; 

Pj+1 

Cjn+ra — Q.1 Wj n _|_ n , 



Matlab code of Algorithm 19.21 
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1. function [x, err, iter, flag] = mlbicgstab(A, x, b, Q, M, max -it, tol, kappa) 



2. 

3. % input: A: N-by-N matrix. M: N-by-N preconditioner matrix 

4. % Q: N-by-n auxiliary matrix with columns qi, ■ ■ ■ , q n . 

5. % x: initial guess, b: right hand side vector. 

6. % maxjit: maximum number of iterations, tol: error tolerance. 

7. % kappa: (real number) minimization step controller: 

8. % zero, standard minimization 

9. % positive, Sleijpen-van der Vorst minimization 

10. % output: x: solution computed, err: error norm. 

11. % iter: number of iterations performed. 

12. % flag: = solution found to tolerance 

13. % 1 = no convergence given maxAt 

14. % -1 = breakdown 

15. % storage: c: 1 x n matrix. x,r,b,uJ, z: N-by-1 matrices. 

16. % A, M: N-by-N matrices. Q, G, W: N-by-n matrices. 
17. 

18. N = size(A, 2); n = size(Q, 2); 

19. G = zeros{N, n); W = zeros{N, n); % initialize workspace for g, w and 

20. c = zeros (1, n); % end initialization 
21. 

22. iter = 0; flag = 1; bnrml = norm(b); 

23. if bnrml == 0.0, bnrm2 = 1.0; end 

24. r = b — A*x; err = norm(r)/bnrm2; 

25. if err < toZ, /Zag = 0; return, end 
26. 

27. G(:, 1) = M\r; W(:, 1) = A * G(:, 1); c(l) = Q(:, 1)' * W(:, 1); 

28. if c(l) == 0, //a<7 = —1; return, end 

29. e = Q(:,l)'*r; 
30. 

31. for j = : max -it 

32. for i = 1 : n - 1 

33. alpha = e/c(i); 

34. x = x + alpha * G{:,i); 

35. r = r — alpha * W(:, i); 

36. err = norm(r)/bnrm2; 

37. iter = iter + 1; 

38. if err < toZ, flag = 0; return, end 

39. if iter >= maxJt, return, end 
40. 

41. e = Q(:,i + l)'*r; 

42. if j >= 1 

43. beta = -e/c(i + 1); 

44. W(:,i + l) = r + beta*W(:,i + l); 

45. G(:,i + 1) = beta*G(:,i + l); 

46. for s = i + 1 : n — 1 

47. feeta = -Q(:, s + 1)' * W(:, i + l)/c(s + 1); 

48. W(:, i + 1) = W(:, i + 1) + 6eta * W(:, s + 1); 

49. G(:, i + 1) = G(:, i + 1) + beta * G(:, s + 1); 
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50. end 

51. G(:,i + 1) = {M\W(:,i + 1)) + (1/rho) * G(:,i + 1); 

52. else 

53. G(:,i + l) = M\r; 

54. end 

55. W(:,i + 1) = A*G(:,i + l); 

56. for s = : i — 1 

57. 6eta = -Q(:, s + 1)' * W(:, i + l)/c(s + 1); 

58. W(:,i + 1) = W(:,i + l)+6eto*W(:,s + l); 

59. G(:, i + 1) = G(:, i + 1) + 6eto * G(:, s + 1); 

60. end 

61. c(i + l) = Q(:,i + l)' *W(:,i + l); 

62. if c(i + 1) == 0, flag = —1; return, end 

63. end 

64. alpha = e/c(n); 

65. x = x + alpha * G(:,n); 

66. r = r — alpha * W(:,n); 

67. err = norm(r)/bnrm2; 

68. if err < to/, /7ag = 0; iter = iter + 1; return, end 

69. uJ, = M\r; z = A* uJb\ rho = z' * z; 

70. if rho == 0, flag = — 1; return, end 

71. omega = z' * r; 

72. if omega == 0, /Zap = —1; return, end 

73. r/io = —omega/ rho; 

74. if kappa > 

75. omega = omega/ (norm(z) * norm(r)); 

76. absjom = abs(omega); 

77. if abs-om < kappa 

78. r/io = r/io * kappa/ absuom; 

79. end 

80. end 

81. x = x — rho * uJt; 

82. r = r + r/io * 2; 

83. err = norm(r)/bnrm2; 

84. iter = iter + 1; 

85. if err < toZ, /Zap = 0; return, end 

86. if iter >= maxJt, return, end 
87. 

88. e = Q(:, 1)' * r; 6eta = -e/c(l); 

89. W(:,l) = r + 6eto*W(:,l); 

90. G(:,l) = 6eta*G(:,l); 

91. for s = 1 : n — 1 

92. feeta = -Q(:, s + 1)' * W(:, l)/c(s + 1); 

93. W(:,l) = W(:,l) + 6e*a*W(:,s + l); 

94. G(:,l) = G(:,l)+6eta*G(:,s + l); 

95. end 

96. G(:, 1) = (M\W(:, 1)) + * G(:, 1); 1) = A * G(:, 1); 

97. c(l) = Q(:,l)'*W(:,l); 

98. if c(l) == 0, flag = —1; return, end 
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99. end 

A sample run of ML(n)BiCGstab 

1. N = 100; A = randn(N); M = randn(N); b = randn{N, 1); 

2. n = 10; kappa = 0.7; tol = 10~ 7 ; maxJt = 3 * N; 

3. Q = sign{randn(N , n)); x = zeros{N, 1); Q(:, 1) = b — A * x; 

4. [x, err, iter, flag] = mlbicgstab(A, x, b, Q, M, max-.it, tol, kappa); 

Acknowledgements 
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